psychology science

Bootstrap update

Update: This post used an incorrect implementation of the bootstrap, so the conclusions don’t hold. See this correction

Mike suggested that I alter the variance of the underlying distibutions. This makes total sense, since it matches what we are usually trying to do in psychological research – detect a small difference in a lot of noise. So I made the underlying distibutions look a lot like reaction time distributions, with a 30ms difference between them. The code is


Where m is the sample size, and d is either 0 or 30. For a very large sample, the distributions look like this:

After a discussion with Jim I looked at the hit rate and false alarm rate separately. For the simple comparison of means, the false alarm rate stays around 0.5 (as you’d predict). For the other tests it drops to about 0.05. The simple comparison of means is so sensitive to a true difference, however, that the dprime can still be superior to that of the other tests. Which suggests dprime is not a good summary statistic to me, rather than that we should do testing simply by comparing the sample means.

So I rerun the procedure I described before, but with higher variance on the underlying samples.

The results are very similar. The bootstrap using the mean as the test statistic is worse than the t-test. The bootstrap using the median is clear superior. This surprises me. I had been told that the bootstrap was superior for nonparametric distributions. In this case it seems as if using the mean as a test statistic eliminates the potential superiority of bootstrapping.

This is still a work in progress, so I will investigate further and may have to update this conclusion as the story evolves.

books quotes

Debt: The first 5,000 years

David Graeber traces a line from Roman property law, through Cartesian dualism and Hobbes’ state of nature, to the foundational myth of the free market:

At this point we can finally see what’s really at stake in our peculiar habit of defining ourselves simultaneously as master and slave, reduplicating the most brutal aspects of the ancient household in our very concept of ourselves, as masters of our freedoms, or as owners of our very selves. It is the only way that we can imagine ourselves as completely isolated beings. There is a direct line from the new Roman conception of liberty – not as the ability to form mutual relationships with others, but as the kind of absolute power of “use and abuse” over the conquered chattel who make up the bulk of a wealthy Roman man’s household – to the strange fantasies of liberal philosophers like Hobbes, Locke, and Smith, about the origins of human society in some collection of thirty- or forty-year-old males who seem to have sprung from the earth fully formed, then have to decide whether to kill each other or begin to swap beaver pelts.

David Graeber (2011) ‘Debt: The First 5000 years’, p209-210.

Graeber uses an anthropologist’s view of history to argue that markets are brought into existence by the state, and particularly by an expansionist military state which wishes to force all social actors to be intermediaries in the war machine. By obliging everyone to accept state currency a state-coinage-slavery complex is created. This dynamic drives the creation of slaves, which are, by definition, people ripped from all social context. The collision of market economies with social economies (which are about interaction as much as obtaining goods) creates a moral dilemma which we can trace written in the texts of all the ancient religions (you’ll have to read the book for details). The dominant modes of human relation in historical time have been three: exchange, hierarchy and communism (not in the Marxist sense). The dominion of the exchange mode, and its perversion into being primarily market exchange, reduces the primacy of the other modes in the models of liberal/market thinkers, and so our conception of our selves (individually and politically) is contaminated by contradictory notions of debt and ownership (again, you’ll have to read the book). Ultimately this finds expression in a vision of ourselves as separate from our own bodies, and in the foundational myth of economics in which we markets come into being de novo among an asocial but equal status collection of isolates who can begin to trade to satisfy their wants.

It’s an extremely rich book, which is also very disorganised in its arguments. I’m still digesting what I’ve read so this is a poor summary. Most importantly for me, and separate from the specifics of the argument, the anthropological and historical material does the job of expanding our conception of what we and our society could be.

Pro-tip: on the final pages (p384-387) Graeber offers his own summary of the thesis of the book.


Testing bootstrapping

Update: This post used an incorrect implementation of the bootstrap, so the conclusions don’t hold. See this correction

This surprised me. I decided to try out bootstrapping as a method of testing if two sets of numbers are drawn from different distributions. I did this by generating sets of numbers of size m from two ex-gaussian distributions which are identical except for a fixed difference, d


All code is matlab. Sorry about that.

Then, for each pair of numbers I apply a series of different tests for if the distributions are different.
1. Standard t-test (0.05 significance level)
2. Is the mean(s1) 3. Bootstrapping using mean as the test statistic (0.05 significance level)
4. Bootstrapping using the median as the test statistic (0.05 significance level)

I used Ione Fine’s pages on bootstrapping as a guide. The bootstrapping code is:

function H=bootstrap(s1,s2,samples,alpha,method)

for i=1:samples
    if method==1


H = CI(1)>0 | CI(2)<0;

I do that 5000 times for each difference, d, and each sample size, m. Then I take the average answer from each test (where 1 is 'conclude there distributions are different' and 0 is 'don't conclude the distributions are different'). For the case where d > 0 this gives you a hit rate, the likelihood that the test will tell you there is a difference when there is a difference. For d = 0.5 you get a difference that most of the tests can detect the majority of the time as long as the sample is more than 50. For the case where d = 0, you can calculate the false alarm rate for each test (at each sample size).

From these you can calculate d-prime as a standard index of sensitivity and plot the result. Sttest, Smean, Sbootstrap and Sbootstrap2 are matrices which hold the likelihood of the four tests giving a positive answer for each sample size (columns) for two differences, 0 and 0.5 (the rows):

hold on
xlabel('Sample size')
ylabel('sensitivity - d prime')

Here is the result (click for larger):

What surprised me was:

  • The t-test is more sensitive than the bootstrap, if the mean is used as the test statistic
  • How much more sensitive the bootstrap is than the other tests if the median is used as the test statistic
  • How well the simple mean does. I suspect there's so nuance I'm missing here, such as unacceptably high false positive rate for smaller differences

Update 28/11/12
-Fixed an inconsequential bug in the dprime calculation
-Closer inspection shows that the simple mean case gives a ~50% false alarm rate, but the high sensitivity offsets this. Suggests dprime isn't a wise summary statistic?


Tweets New paper: “Memory enhances the mere exposure effe…

New paper: “Memory enhances the mere exposure effect”… in the marketing literature, this is contra received wisdom


Links for autumn 2012


Tweets Kent Berridge of “wanting vs liking” fame comments…

Kent Berridge of “wanting vs liking” fame comments on Ainslie’s hyperbolic discounting… (2008)


Tweets Research in Progress – still got it…

Research in Progress – still got it


Tweets Why is it so hard to give good directions? http://…

Why is it so hard to give good directions?… my latest @BBC_Future column now up on


Tweets Against distributed representations: “On the biolo…

Against distributed representations: “On the biological plausibility of grandmother cells” Bowers, 2009


Tweets Ninja Standing Desks: And you…

Ninja Standing Desks: And you can pay in Bitcoins. Made in the Bay Area, of course


Tweets Belgian electro rock band Goose perform their song…

Belgian electro rock band Goose perform their song “British Mode” on board a giant rotating Ames window…


Tweets Do You Know What I’m Thinking?…

Do You Know What I’m Thinking?… featuring all your favourite U of Manchester psychologists, from @Psy_File


Tweets Tetris skills after 10,000 hours practice: https:/…

Tetris skills after 10,000 hours practice:… my article on the psychology of tetris…


Tweets Time resolution of clocks: Effects on reaction tim…

Time resolution of clocks: Effects on reaction time measurement—Good news for bad clocks…