Friday, October 30, 2009

Long Odds

by Richard Crews
Consider the curious statistical significance of the first game of the World Series. In over 100 years (including 20 sweeps), no team has ever swept the World Series after losing the first game.

Some statistical "anomalies" like that are amusing; their errors are obvious. But suppose you test positive for a rare disease, one that affects only one person in every 10,000. And the test the doctor used is highly accurate, that is, it gives the correct answer 99% of the time.

Is it time to panic? No. The odds are still 99% that you DON'T have the disease. If 10,000 people were given the test, 100 of them would test positive, but only one of these would actually have the disease.

Similarly, if vast data banks of DNA now being collected are matched against the DNA from a crime scene--say from blood, saliva, or semen--and a match can be found, should that person be arrested? No, not on the basis of that evidence alone. The DNA sample may match only one person in a million, but if the data bank contains 100 million people, then there can be expected to be 100 people from that data bank who match the crime-scene DNA.

There are law-enforcement programs to sort DNA evidence from hundreds of unsloved crimes against vast DNA data banks. And there are epidemiological studies to test populations of tens of thousands for rare diseases. Such efforts are ripe for false positive findings.

The privacy of personal data is not a trivial matter. In this age of lightening fast computers sifting through vast arrays of data, the emotional and social pain erroneously inflicted may far outweigh the benefits.