How to lie with Statistics is a classic. A ‘sort of primer in ways to use statistics to deceive’, says the author Darrell Huff. Why teach the world how to lie with statistics? Because, he says, ‘crooks already know these tricks; honest men must learn them in self-defence’.
The bulk of the book is worked examples of classic statistical slights-of-hand: graphs with missing sections on the axes, different kinds of averages, post-hoc observation of correlations. What I want to do here is just review the last chapter ‘How to talk back to statistics’, which gives some rules of thumb on how to ‘look a phoney statistic in the eye and face it down’ whilst recognising ‘sound and usable data in the wilderness of fraud’. Huff gives the reader five simple questions with which to arms themselves, which i summarise and then provide some commentary on at the end.
Huff’s five questions with which to arm yourself against statistics:
1. Who says so?
Can we suspect deliberate or unconscious bias in the originator of the statistic? Huff recommends looking for an “O.K Name” – e.g. a university – as some slim promise of reliability. Second to this he recommends being careful to distinguish the originator of the ‘data’ from the originator of the conclusion or intepretation.
2. How Does He Know?
Is the sample biased? representative? large enough?
3. What’s Missing?
Statistics given without a measure of realiability are ‘not to be taken very seriously’. What is the relevant base rates / appropriate comparison figure? Do averages disguide important variations?
4. Did somebody change the subject?
E.g. More reported cases are not the same as more cases, what people say they do (or will do) is not the same as what they actually do (or will do), association (correlation) is not causation.
5. Does it make sense?
Is the figure spuriously accuracy? Convert percentages to real numbers and convert real numbers to percentages, compare both with your intuitions.
Commentary by tom
Who says so?
Two things interest me about the recommendation to base judgements of credibility, even if just in part, on authority. Firstly, by doing this Huff is conceeding that we are simply not able to make a thorough independent evaluation of the facts ourselves. This is in contradiction to the idea that science is Best because if we doubt something we can check it out for ourselves. The pragmatic response to this is obviously ‘well, you won’t check out everything, but you could check out any individual thing if you wanted’. Is this much consolation in a world where everyone, including the authorities, are assaulted by too much information to check out personally? Scientific authority then becomes a matter of which social structures, which use which truth-heuristics, you trust rather than a matter of direct proof (“it says so in the bible!” verses “it says so in a respectable academic paper!”?).
The second thing that interests me is that the advice to rely on authorities becomes problemmatic for those who either don’t know who the authorities are, or who distrust the usual authorities. What proportion of the population knows that the basic unit of scientific authority is the peer-reviewed journal paper? You can see that if you don’t know this you immediately lose a vital heuristic for evaluating the credibility of research you are told about. In a similar vein, even experts in one domain may be ignorant of the authorities in another domain — leading to similar problems with judging credibility. If you know about but simply don’t trust the established authorities you are similarly lost at sea when trying to evaluate incoming evidence (a reason, I’ll bet, for the mixed quality of information available from, variously, conspiracy theorists and alternative medicine practicioners).
How Does He Know?
This is perhaps the most important question you can ask in my opinion. Often all that is required to dispell the superficially-convincing fog that accompanies some statistic or factoid is to ask How did they find out? What would actually be involved in gathering that information? Could it possibly be correct? For example, ‘if you die in your dream, you really die’ How do they know?! Dead people aren’t exactly available for comment.
What’s Missing?
Knowing what is missing is the hardest trick, in my opinion. It’s a mark both of expertise and of genuine intelligence to be able to pick up on what isn’t being said, to notice when the intepretation of what you’re being told could be fundamentally altered by something you aren’t being told (because, of course, this involes imaging a bunch of counter-factuals). Outside the realm of statistics the idea of frame-analysis speaks to this idea of making invisible what isn’t talked about.
Did somebody change the subject?
Does it make sense?
Both good checks to carry out when challenged by a statistic. It is unfortunate that statistics seem to have an inherant air of authority – a kind of wow factor – and these questions are good tools with which to start dismantling it. I think this wow factor is because statistics seem to imply rigourous, unbiased, comprenhensive investigation, even though they may in fact arise from nothing of the sort. In the same way that evolution will produce imitators who have the colouration of being poisonous, or whatever, without actually bothering to have to produce the poison, and most social situations will attract free-riders who want to get the benefits without paying the costs, so there is an evolution of rhetoric strategies to include things which carry the trappings of credible information without going through the processes which are actually causal in making the information from these sources credible. So we get statistics because everybody knows science uses statistics, we get figures quoted to the second decimal place when the margin of error is a hundred times larger than this level of accuracy, and we get nonsensical arguments supported using citations, even though the studies or works cited are utterly without credibility, because having citations in your arguments is an established form of credible arguments which is easy to reproduce for any argument, whatever the level of credibility.
Reference: Darrell Huff (1954) How to Lie with Statistics