statistical self-defence

How to lie with Statistics is a classic. A ‘sort of primer in ways to use statistics to deceive’, says the author Darrell Huff. Why teach the world how to lie with statistics? Because, he says, ‘crooks already know these tricks; honest men must learn them in self-defence’.

The bulk of the book is worked examples of classic statistical slights-of-hand: graphs with missing sections on the axes, different kinds of averages, post-hoc observation of correlations. What I want to do here is just review the last chapter ‘How to talk back to statistics’, which gives some rules of thumb on how to ‘look a phoney statistic in the eye and face it down’ whilst recognising ‘sound and usable data in the wilderness of fraud’. Huff gives the reader five simple questions with which to arms themselves, which i summarise and then provide some commentary on at the end.

Huff’s five questions with which to arm yourself against statistics:

1. Who says so?

Can we suspect deliberate or unconscious bias in the originator of the statistic? Huff recommends looking for an “O.K Name” – e.g. a university – as some slim promise of reliability. Second to this he recommends being careful to distinguish the originator of the ‘data’ from the originator of the conclusion or intepretation.

2. How Does He Know?

Is the sample biased? representative? large enough?

3. What’s Missing?

Statistics given without a measure of realiability are ‘not to be taken very seriously’. What is the relevant base rates / appropriate comparison figure? Do averages disguide important variations?

4. Did somebody change the subject?

E.g. More reported cases are not the same as more cases, what people say they do (or will do) is not the same as what they actually do (or will do), association (correlation) is not causation.

5. Does it make sense?

Is the figure spuriously accuracy? Convert percentages to real numbers and convert real numbers to percentages, compare both with your intuitions.

Commentary by tom

Who says so?

Two things interest me about the recommendation to base judgements of credibility, even if just in part, on authority. Firstly, by doing this Huff is conceeding that we are simply not able to make a thorough independent evaluation of the facts ourselves. This is in contradiction to the idea that science is Best because if we doubt something we can check it out for ourselves. The pragmatic response to this is obviously ‘well, you won’t check out everything, but you could check out any individual thing if you wanted’. Is this much consolation in a world where everyone, including the authorities, are assaulted by too much information to check out personally? Scientific authority then becomes a matter of which social structures, which use which truth-heuristics, you trust rather than a matter of direct proof (“it says so in the bible!” verses “it says so in a respectable academic paper!”?).

The second thing that interests me is that the advice to rely on authorities becomes problemmatic for those who either don’t know who the authorities are, or who distrust the usual authorities. What proportion of the population knows that the basic unit of scientific authority is the peer-reviewed journal paper? You can see that if you don’t know this you immediately lose a vital heuristic for evaluating the credibility of research you are told about. In a similar vein, even experts in one domain may be ignorant of the authorities in another domain — leading to similar problems with judging credibility. If you know about but simply don’t trust the established authorities you are similarly lost at sea when trying to evaluate incoming evidence (a reason, I’ll bet, for the mixed quality of information available from, variously, conspiracy theorists and alternative medicine practicioners).

How Does He Know?

This is perhaps the most important question you can ask in my opinion. Often all that is required to dispell the superficially-convincing fog that accompanies some statistic or factoid is to ask How did they find out? What would actually be involved in gathering that information? Could it possibly be correct? For example, ‘if you die in your dream, you really die’ How do they know?! Dead people aren’t exactly available for comment.

What’s Missing?

Knowing what is missing is the hardest trick, in my opinion. It’s a mark both of expertise and of genuine intelligence to be able to pick up on what isn’t being said, to notice when the intepretation of what you’re being told could be fundamentally altered by something you aren’t being told (because, of course, this involes imaging a bunch of counter-factuals). Outside the realm of statistics the idea of frame-analysis speaks to this idea of making invisible what isn’t talked about.

Did somebody change the subject?

Does it make sense?

Both good checks to carry out when challenged by a statistic. It is unfortunate that statistics seem to have an inherant air of authority – a kind of wow factor – and these questions are good tools with which to start dismantling it. I think this wow factor is because statistics seem to imply rigourous, unbiased, comprenhensive investigation, even though they may in fact arise from nothing of the sort. In the same way that evolution will produce imitators who have the colouration of being poisonous, or whatever, without actually bothering to have to produce the poison, and most social situations will attract free-riders who want to get the benefits without paying the costs, so there is an evolution of rhetoric strategies to include things which carry the trappings of credible information without going through the processes which are actually causal in making the information from these sources credible. So we get statistics because everybody knows science uses statistics, we get figures quoted to the second decimal place when the margin of error is a hundred times larger than this level of accuracy, and we get nonsensical arguments supported using citations, even though the studies or works cited are utterly without credibility, because having citations in your arguments is an established form of credible arguments which is easy to reproduce for any argument, whatever the level of credibility.

Reference: Darrell Huff (1954) How to Lie with Statistics

If you want accurate knowledge, and you don’t trust the experts, then you have a lot of work ahead of you. I don’t think there’s any way to get around this.

For example, I personally distrust certain FDA studies on pesticide safety. (In some cases, I think the wrong questions were asked. In other cases, I think the researchers had conflicts of interest.) And unfortunately, I have neither the time nor the expertise to redo these studies.

So there’s no real way for me to tell whether certain pesticides are safe. I can only rely on a subjective analysis of the regulatory process, and decide whether I trust the FDA.

Peer-reviewed journals are actually a pretty good solution to this problem. Assuming that you generally trust researchers in a given field, you can assume that peer reviewed work is pretty solid, most of the time.

But what if there’s a big schism within a field, or if you don’t trust the average peer reviewer? Well, then you’ve got a lot of difficult reading ahead of you. And if you want to get plausible answers, you will need to be rigorous and intellectually honest.

3 replies on “statistical self-defence”

[…] Tom at idiolect gives his commentary on five rules of thumb which tell you how to talk back to statistics; via Mind Hacks. […]

It’s worth keeping in mind that 90% of statistics are made up on the spot…

I love that line. Anyway, adding to Eric’s remarks about peer-reviewing, I suppose the reason it (usually) works is that while you might not trust any given researcher, you accept a consensus from a group of researchers who are not closely affiliated with each other. I suppose it’s connected to the reason that we have twelve-person juries instead of the (indisputably more efficient) method of just bringing one citizen in to watch a trial and draw a conclusion. And in turn, why the law lords and US supreme court are both groups, rather than just one nearly-omnipotent judge. Though the current supreme court is a very good example of how easily the ‘who says so’ test breaks down, since the supreme court, though supposedly as august an authority as you can get in the states, is being turned into Bush’s personal bulldog-pen.

Share this:

Related

3 replies on “statistical self-defence”

Leave a Reply