08 April 2005

A Tool That Counts: Basic Statistics for the Amateur Scientist

Part 3. Reasonable Expectations and Helpful Principles

Mark Hartwig, Ph.D.

Editor's Note: The original version of this 3-part series was first published as a major feature in Science Probe! (April 1992). The article received some nice reader feedback and college professors asked to reprint it. When Nature, one of the world's leading journals of science, reviewed Science Probe! in a feature about new science publications, this article was specifically cited by the reviewer, James Lovelock. We are grateful to Dr. Hartwig for allowing The Citizen Scientist to present this series based on his article.

When presented properly, statistics is a fascinating subject that can open up new ways of looking at the natural world. Moreover, the basic principles are actually quite simple and can enhance anyone's critical thinking skills. The purpose of this series is to provide a gentle introduction to the wide world of statistics and to show you how to use statistical tools and principles to improve your understanding of the world around you. In Part 1 (Compiling and Sorting Your Data) (TCS, 11 March 2005) we explored basic descriptive statistics. In Part 2 (Analyzing and Evaluating Your Data) (TCS, 25 March 2005) we looked at analyzing and evaluating data. Part 3 concludes the series.

Inferential Statistics

As you can see, statistics provides us with many ways to numerically describe a data set. But statistics involves much more than description. In fact, the most important aspect of statistics is not description but inference.

In scientific investigations, the purpose of collecting data is not simply to describe the sample being studied, but to make inferences about an entire class of objects or phenomena. For example, the point of testing a cancer drug on a sample of rats is not simply to see how it affects cancer in those particular rats, but to discover how it affects cancer in any rat. Given a well-designed and properly executed study, inferential statistics allows the researcher to draw conclusions about an entire population of objects or phenomena from a single sample.

At first glance, the problem of generalizing from a sample seems rather trivial: If you've done a good job of designing and executing your study, simply assume that your results are the same ones you would get if you had studied the entire population. Thus, if a team of researchers found that wonder drug X reduced the size of tumors in eight out of ten rats, compared to two out of ten in an untreated control group, they might simply assume that the drug could reduce similar tumors in 80 percent of all rats.

This approach is not unreasonable. Given the results of the study, 80 percent is obviously the best estimate of how any group of rats would respond. But how much confidence can we place in this figure? After all, the real world is a messy place, where repeated studies can yield varying results, even when conducted under essentially identical conditions. If we ran the experiment again, chances are we would get different results. Perhaps all ten rats would respond favorably to the drug… or maybe only two.

So how can inferential statistics help us handle this problem? I've already hinted at the answer. It helps us not by magically revealing the true underlying value, but by helping us determine how much confidence we can have that the results of a given study actually reflect that value. Thus, in the cancer study above, the issue is not whether 80 percent is our best guess. Given the results of our study, it is our best guess. The issue is how much confidence we can have in that best guess. In short, how good a bet is it that the real value is 80 percent-rather than some other value?

The difference is subtle but important, because it allows researchers to harness the laws of probability to generalize beyond their own limited data sets. In particular, if a study is designed in such a way that the only uncontrolled influences are random ones (an unrealistic condition that researchers do their best to approximate) the laws of probability can be used to estimate how much the results of a study can be expected to vary simply by chance alone.

Reasonable Expectations

With this in mind, let's go back and take a fresh look at our cancer example. Instead of asking whether our result (80 percent) represents some underlying “true” value, it might be more fruitful to ask how much variation we can reasonably expect on the basis of chance. Is there a reasonable chance that another study would yield a result of, say, 50 percent?

More to the point, is there a reasonable chance that another study would show the treatment group doing no better-or even worse-than the control group? If so, then it is possible to chalk up any differences between the treatment and control group to chance alone. Hence, there is no statistically significant difference between the two groups, despite the rather large value of 80 percent.

If, on the other hand, there is only a very small chance that the treatment group would have done as poorly or worse than the control group, then it is possible to attribute this difference to something other than chance, such as the cancer drug. In this case, the difference would be statistically significant.

Helpful Principles

Space does not allow a full discussion of how to determine whether or not our research results are statistically significant. Besides, this is usually where statophobes retreat to a corner and start to gurgle or twiddle their lips. But there are certain helpful principles that can improve the way even non-statisticians examine their data.

First and foremost, it is helpful just to recognize that variation exists, and that both random and non-random factors can influence this variation.

Second, you should also keep in mind that, all other things being equal, variation is more pronounced with small samples than with large ones. The larger your sample, the more stable your results will be. They will be less subject to the possibility that another study would produce greatly different results. A corollary is that large samples are less likely to produce extreme results. For example, assuming that you have a fair coin, it's much more difficult to get all heads when you toss a coin 50 times than when you toss it only two or three times. Similarly, in our cancer study above, our obtained value of 80 percent for the treatment group (compared to 20 percent for the control group) is much less impressive than if our researcher had used 30 or 40 rats in each group instead of only 10.

Finally, no matter how large your sample or how good your study, it pays to remember that any results are still only probable. Hence, there's always a possibility that chance alone has produced effects mistakenly attributed to other factors. Returning once again to our hypothetical cancer study, let's assume that we had used 40 rats in each group instead of 10. This being the case, there is still a slight possibility-very slight-that our 80 percent figure is nothing more than an aberration produced by random factors. Perhaps there really is no difference between the treatment and control group.

The problem of random aberrations becomes particularly acute if you run lots of related tests on the same data set without adjusting your probability estimates. Unless you make these adjustments, the more tests you run, the greater your chances of coming across a spurious “statistically significant” result. So if you are examining large tables of correlations, or comparing experimental groups on lots of variables, don't make too much of just one or two “statistically significant” results. They could have happened simply by chance.

This is also a good reason not to “poke around” in your data when you want to make inferences beyond your particular study. When you're using inferential statistics, poking around is like “peeking” at your data. Many times it's all too easy to discover a “statistically significant" result and report it as if your discovery was what you had set out to study in the first place.

Summary

Although it's difficult to write a suitably nuanced description of statistics in such a brief space, I hope this article has given you a basic idea of what statistics is all about. And if I haven't scared you away, perhaps you even have what it takes to become a statistician yourself.

This concludes the 3-part series "A Tool That Counts: Basic Statistics for the Amateur Scientist."

Acknowledgments

The Citizen Scientist and Dr. Hartwig are grateful to Richard D. McPeters and Arlin J. Krueger of NASA's Goddard Space Flight Center, members of the TOMS Ozone Processing Team, and the National Space Science Data Center for providing the measurements of total ozone made by the TOMS instrument aboard the Nimbus-7 satellite.

We are also happy to thank Robert Green of the National Oceanic and Atmospheric Administration and the Fresno-based Dobson instrument team for supplying additional measurements of ozone.


   
Copyright 2005 by Society for Amateur Scientists