“There are three kinds of lies: lies, damned lies, and statistics” – Mark Twain, attributed to Disraeli
Lying with Statistics
The ‘Statistical fallacy’ is an entire class of fallacies that present statistical data in a very biased way, and interpreting statistics without questioning the methodology.
- Claim A is made.
- Statistic S is manipulated to support claim A.
“Looking at that pie chart, there is a very small percentage of people who declare themselves atheist. Therefore, atheism is not that popular”.
- First, atheism is better described as a lack of belief.
- Second, many non-believers are not even familiar with the term ‘atheist’, and often consider themselves Christian, Jewish, or some other religion, based on their culture and family tradition. Statistics don’t account for this.
The original classic: Huff, D.
“How to Lie with Statistics”
New York: W. W. Norton & Company.
Types of Statistical fallacies:
- Biased Samples: Conclusions reached from samples that are too small, biased, or both. “A report based on sampling must use a representative sample, [and] one from which every source of bias has been removed”.
[Ed. We think unrepresentative samples are a different category]
- Unrepresentative samples: “With smaller samples you have larger variance. With 10 coin flips you might get 8 heads, but you’re much less likely to get 80 heads in 100 coin flips”.
- Biased Averages:
There are three kinds of average:
- The mean: add up all the values and divide by the quantity of values
- The mode: the most common value
- The median: the value in the middle of the sample
“These can be very different numbers, and reporters and others will pick the one that best supports their argument.
In normal distributions, the three will be near each other, but in irregular distributions (e.g. annual household income) you’ll get vastly different numbers for each”.
- Discarded Data: “Companies will keep running experiments until they get the results they want, discarding the experiments that ‘failed to produce significant findings'”.
- Graph Manipulation: The same data can be presented in different ways, designed to mislead:
- Omitting the baseline
- Manipulating (amplifying) the Y-axis
- Cherry-picking data
- Using the wrong graph, e.g. a Pie chart when there is overlap
- Going against conventions, e.g. using a Darker shade for low density
7 Ways To Lie With Statistics And Get Away With It
(Omitted categories already described above)
- Results falling within the standard error (aka: ‘Unrepresentative samples’): For example, “E-books Preferred Over Paper By Men More Than By Women” sounds remarkable until you find out that of the actual polling results found that 52% of men preferred e-books versus 49% for women, and the error of the survey was +/-5%.
- Post-hoc fallacy: Incorrectly asserting that there is a direct correlation between two findings. “This is often more difficult to catch than the other tactics. For example, if a study finds that vegetarians have a higher average income than meat-eaters, it would be absurd to conclude that you can raise your income by abstaining from meat. But that is exactly what some ‘researchers’ do”.
More from ‘How to Lie with Statistics’
- The Semiattached Figure: “If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing”. For example, “More accidents occur in clear weather, so “clear weather is more dangerous than foggy weather”.
- Correlation vs. Causation claims that “if B follows A, then A has caused B”. For example, “since smoking and low grades go together, smoking causes low grades”. It could just as well be the other way around.