Understanding sampling

Thanks to Kat for this, from

People ? often do not have a good sense of the limitations of sample-based research. Warren Cordell, chief statistical officer at Nielsen for many years, devised a wonderful visual explanation for [the United States] Congress, which went as follows. The picture (below) is comprised of several hundred thousand tiny dots (the population).

The three smaller pictures contain 250, 1,000 and 2,000 dots (the samples). They are ‘area probability’ samples of the original picture, because the dots are distributed in proportion to their distribution in the picture. If we think of homes [or persons, consumers] instead of dots, this is the sampling method used for most media research studies.

Now move back 30 inches or so. When the eye stops trying to read the dots, even the smallest sample provides a recognisable picture (you can use top-line data). But you would have trouble picking her out of a group of women based on the 250-dot sample (do not try reading demographic breaks). At 1,000 dots, if you squint to read the pattern of light and dark, you would recognise her in a group (now you can read major demographics). At 2,000 dots, you see her more clearly – but the real improvement is between 250 and 1,000 – an important point. In sampling, the ability to see greater detail is a ‘squared function’ – it takes four times as large a sample to see twice the detail. This is the strength and weakness of sample-based research. You get the general picture cheap, but precision costs a bundle.

2 replies on “Understanding sampling”

Same may apply to brain cells. If Fisher information is the measure of accuracy, to halve your coding error you need to quadruple the number of brain cells, to quarter your coding error you need sixteen times as many brain cells.
In general coding error is proportional to 1/(numcells.^2), given that the cells have identical tuning properties and their coding noise is not correlated between neurons.

By coding error, I mean the standard deviation in the represented value of a variable. For example, a population of neurons coding sound level of 50dB, when their noisy spike counts read by an optimal decoder, may sometimes be representing it in their spikes as 51dB, or 49dB, or 50dB. The variation in represented value has a standard deviation.
This standard deviation is proportional to the just-noticable difference in the variable.

Comments are closed.