{"id":5800,"date":"2012-11-26T07:49:25","date_gmt":"2012-11-26T06:49:25","guid":{"rendered":"http:\/\/idiolect.org.uk\/notes\/?p=5800"},"modified":"2012-12-06T22:58:53","modified_gmt":"2012-12-06T21:58:53","slug":"testing-bootstrapping","status":"publish","type":"post","link":"https:\/\/idiolect.org.uk\/notes\/2012\/11\/26\/testing-bootstrapping\/","title":{"rendered":"Testing bootstrapping"},"content":{"rendered":"<p><strong>Update: This post used an incorrect implementation of the bootstrap, so the conclusions don&#8217;t hold. See <a href=\"http:\/\/idiolect.org.uk\/notes\/?p=5832\">this correction<\/a><\/strong><\/p>\n<p>This surprised me. I decided to try out bootstrapping as a method of testing if two sets of numbers are drawn from different distributions. I did this by generating sets of numbers of size m from two ex-gaussian distributions which are identical except for a fixed difference, d<\/p>\n<pre>\r\n    s1=randn(1,m)+exp(randn(1,m));\r\n    s2=randn(1,m)+exp(randn(1,m))+d;\r\n<\/pre>\n<p>All code is matlab. Sorry about that.<\/p>\n<p>Then, for each pair of numbers I apply a series of different tests for if the distributions are different.<br \/>\n1. Standard t-test (0.05 significance level)<br \/>\n2. Is the mean(s1)<mean(s2)? <br \/>\n3. Bootstrapping using mean as the test statistic (0.05 significance level)<br \/>\n4. Bootstrapping using the median as the test statistic (0.05 significance level)<\/p>\n<p>I used <a href=\"http:\/\/courses.washington.edu\/matlab1\/Bootstrap_examples.html#2\">Ione Fine&#8217;s pages on bootstrapping as a guide<\/a>. The bootstrapping code is:<\/p>\n<pre>\r\nfunction H=bootstrap(s1,s2,samples,alpha,method)\r\n\r\nfor i=1:samples\r\n    \r\n    boot1=s1(ceil(rand(1,length(s1))*length(s1)));\r\n    boot2=s2(ceil(rand(1,length(s2))*length(s2)));\r\n    \r\n    if method==1\r\n        a(i)=mean(boot1)-mean(boot2);\r\n    else\r\n        a(i)=median(boot1)-median(boot2);    \r\n    end\r\n    \r\nend\r\n\r\nCI=prctile(a,[100*alpha\/2,100*(1-alpha\/2)]);\r\n\r\nH = CI(1)>0 | CI(2)<0;\r\n<\/pre>\n<p>I do that 5000 times for each difference, d, and each sample size, m. Then I take the average answer from each test (where 1 is 'conclude there distributions are different' and 0 is 'don't conclude the distributions are different'). For the case where d > 0 this gives you a hit rate, the likelihood that the test will tell you there is a difference when there is a difference. For d = 0.5 you get a difference that most of the tests can detect the majority of the time as long as the sample is more than 50. For the case where d = 0, you can calculate the false alarm rate for each test (at each sample size). <\/p>\n<p>From these you can calculate d-prime as a standard index of sensitivity and plot the result. Sttest, Smean, Sbootstrap and Sbootstrap2 are matrices which hold the likelihood of the four tests giving a positive answer for each sample size (columns) for two differences, 0 and 0.5 (the rows):<\/p>\n<pre>\r\nfigure(1);clf\r\nplot(measures,norminv(Sttest(2,:))-norminv(Sttest(1,:),0,1),'k')\r\nhold on\r\nplot(measures,norminv(Smean(2,:))-norminv(Smean(1,:)),'r')\r\n%plot(measures,norminv(Smedian(2,:))-norminv(Smedian(1,:)),'c--')\r\nplot(measures,norminv(Sbootstrap(2,:))-norminv(Sbootstrap(1,:)),'m')\r\nplot(measures,norminv(Sbootstrap2(2,:))-norminv(Sbootstrap2(1,:)),'g')\r\nxlabel('Sample size')\r\nylabel('sensitivity - d prime')\r\nlegend('T-test','mean','bstrap-mean','bstrap-median')\r\n<\/pre>\n<p>Here is the result (click for larger):<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"5807\" data-permalink=\"https:\/\/idiolect.org.uk\/notes\/2012\/11\/26\/testing-bootstrapping\/mc_results_dprime\/\" data-orig-file=\"https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png?fit=1200%2C901&amp;ssl=1\" data-orig-size=\"1200,901\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"mc_results_dprime\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png?fit=300%2C225&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png?fit=580%2C435&amp;ssl=1\" tabindex=\"0\" role=\"button\" src=\"https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png?resize=400%2C301\" alt=\"\" title=\"mc_results_dprime\" width=\"400\" height=\"301\" class=\"aligncenter size-full wp-image-5807\" srcset=\"https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png?w=1200&amp;ssl=1 1200w, https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png?resize=300%2C225&amp;ssl=1 300w, https:\/\/i0.wp.com\/idiolect.org.uk\/notes\/wp-content\/uploads\/2012\/11\/mc_results_dprime.png?resize=1024%2C768&amp;ssl=1 1024w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/a><\/p>\n<p>What surprised me was:<\/p>\n<ul>\n<li>The t-test is more sensitive than the bootstrap, if the mean is used as the test statistic<\/li>\n<li>How much more sensitive the bootstrap is than the other tests if the median is used as the test statistic<\/li>\n<li>How well the simple mean does. I suspect there's so nuance I'm missing here, such as unacceptably high false positive rate for smaller differences<\/li>\n<\/ul>\n<p><b>Update 28\/11\/12<\/b><br \/>\n-Fixed an inconsequential bug in the dprime calculation<br \/>\n-Closer inspection shows that the simple mean case gives a ~50% false alarm rate, but the high sensitivity offsets this. Suggests dprime isn't a wise summary statistic?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Update: This post used an incorrect implementation of the bootstrap, so the conclusions don&#8217;t hold. See this correction This surprised me. I decided to try out bootstrapping as a method of testing if two sets of numbers are drawn from different distributions. I did this by generating sets of numbers of size m from two [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[9],"tags":[],"class_list":["post-5800","post","type-post","status-publish","format-standard","hentry","category-science"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5KQtW-1vy","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/posts\/5800"}],"collection":[{"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/comments?post=5800"}],"version-history":[{"count":16,"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/posts\/5800\/revisions"}],"predecessor-version":[{"id":5843,"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/posts\/5800\/revisions\/5843"}],"wp:attachment":[{"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/media?parent=5800"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/categories?post=5800"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/idiolect.org.uk\/notes\/wp-json\/wp\/v2\/tags?post=5800"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}