A Tool That Counts: Basic Statistics
for the Amateur Scientist
Mark Hartwig, Ph.D.
Editor's Note: The original version of
this 3-part series was first published as a major feature
in Science Probe! (April 1992). The article received
some nice reader feedback and college professors asked to
reprint it. When Nature, one of the world's leading
journals of science, reviewed Science Probe! in a
feature about new science publications, this article was specifically
cited by the reviewer, James Lovelock. We are grateful to
Dr. Hartwig for allowing The Citizen Scientist to
present this series based on his article.
When presented properly, statistics is a
fascinating subject that can open up new ways of looking at
the natural world. Moreover, the basic principles are actually
quite simple and can enhance anyone's critical thinking skills.
The purpose of this series is to provide a gentle introduction
to the wide world of statistics and to show you how to use
statistical tools and principles to improve your understanding
of the world around you. In Part
1 (Compiling and Sorting Your Data) (TCS, 11 March
2005) we explored basic descriptive statistics. In Part
2 (Analyzing and Evaluating Your Data) (TCS,
25 March 2005) we looked at analyzing and evaluating data.
Part 3 concludes the series.
Inferential Statistics
As you can see, statistics provides us with
many ways to numerically describe a data set. But statistics
involves much more than description. In fact, the most important
aspect of statistics is not description but inference.
In scientific investigations, the purpose
of collecting data is not simply to describe the sample being
studied, but to make inferences about an entire class of objects
or phenomena. For example, the point of testing a cancer drug
on a sample of rats is not simply to see how it affects cancer
in those particular rats, but to discover how it affects cancer
in any rat. Given a well-designed and properly executed study,
inferential statistics allows the researcher to draw conclusions
about an entire population of objects or phenomena from a
single sample.
At first glance, the problem of generalizing
from a sample seems rather trivial: If you've done a good
job of designing and executing your study, simply assume that
your results are the same ones you would get if you had studied
the entire population. Thus, if a team of researchers found
that wonder drug X reduced the size of tumors in eight out
of ten rats, compared to two out of ten in an untreated control
group, they might simply assume that the drug could reduce
similar tumors in 80 percent of all rats.
This approach is not unreasonable. Given
the results of the study, 80 percent is obviously the best
estimate of how any group of rats would respond. But how much
confidence can we place in this figure? After all, the real
world is a messy place, where repeated studies can yield varying
results, even when conducted under essentially identical conditions.
If we ran the experiment again, chances are we would get different
results. Perhaps all ten rats would respond favorably to the
drug… or maybe only two.
So how can inferential statistics help us
handle this problem? I've already hinted at the answer. It
helps us not by magically revealing the true underlying value,
but by helping us determine how much confidence we can have
that the results of a given study actually reflect that value.
Thus, in the cancer study above, the issue is not whether
80 percent is our best guess. Given the results of our study,
it is our best guess. The issue is how much confidence
we can have in that best guess. In short, how good a bet is
it that the real value is 80 percent-rather than some other
value?
The difference is subtle but important, because
it allows researchers to harness the laws of probability to
generalize beyond their own limited data sets. In particular,
if a study is designed in such a way that the only uncontrolled
influences are random ones (an unrealistic condition that
researchers do their best to approximate) the laws of probability
can be used to estimate how much the results of a study can
be expected to vary simply by chance alone.
Reasonable Expectations
With this in mind, let's go back and take
a fresh look at our cancer example. Instead of asking whether
our result (80 percent) represents some underlying “true”
value, it might be more fruitful to ask how much variation
we can reasonably expect on the basis of chance. Is there
a reasonable chance that another study would yield a result
of, say, 50 percent?
More to the point, is there a reasonable
chance that another study would show the treatment group doing
no better-or even worse-than the control group? If so, then
it is possible to chalk up any differences between the treatment
and control group to chance alone. Hence, there is no statistically
significant difference between the two groups, despite the
rather large value of 80 percent.
If, on the other hand, there is only a very
small chance that the treatment group would have done as poorly
or worse than the control group, then it is possible to attribute
this difference to something other than chance, such as the
cancer drug. In this case, the difference would be statistically
significant.
Helpful Principles
Space does not allow a full discussion of
how to determine whether or not our research results are statistically
significant. Besides, this is usually where statophobes retreat
to a corner and start to gurgle or twiddle their lips. But
there are certain helpful principles that can improve the
way even non-statisticians examine their data.
First and foremost, it is helpful just to
recognize that variation exists, and that both random and
non-random factors can influence this variation.
Second, you should also keep in mind that,
all other things being equal, variation is more pronounced
with small samples than with large ones. The larger your sample,
the more stable your results will be. They will be less subject
to the possibility that another study would produce greatly
different results. A corollary is that large samples are less
likely to produce extreme results. For example, assuming that
you have a fair coin, it's much more difficult to get all
heads when you toss a coin 50 times than when you toss it
only two or three times. Similarly, in our cancer study above,
our obtained value of 80 percent for the treatment group (compared
to 20 percent for the control group) is much less impressive
than if our researcher had used 30 or 40 rats in each group
instead of only 10.
Finally, no matter how large your sample
or how good your study, it pays to remember that any results
are still only probable. Hence, there's always a possibility
that chance alone has produced effects mistakenly attributed
to other factors. Returning once again to our hypothetical
cancer study, let's assume that we had used 40 rats in each
group instead of 10. This being the case, there is still a
slight possibility-very slight-that our 80 percent figure
is nothing more than an aberration produced by random factors.
Perhaps there really is no difference between the treatment
and control group.
The problem of random aberrations becomes
particularly acute if you run lots of related tests on the
same data set without adjusting your probability estimates.
Unless you make these adjustments, the more tests you run,
the greater your chances of coming across a spurious “statistically
significant” result. So if you are examining large tables
of correlations, or comparing experimental groups on lots
of variables, don't make too much of just one or two “statistically
significant” results. They could have happened simply by chance.
This is also a good reason not to “poke around”
in your data when you want to make inferences beyond your
particular study. When you're using inferential statistics,
poking around is like “peeking” at your data. Many times it's
all too easy to discover a “statistically significant"
result and report it as if your discovery was what you had
set out to study in the first place.
Summary
Although it's difficult to write a suitably
nuanced description of statistics in such a brief space, I
hope this article has given you a basic idea of what statistics
is all about. And if I haven't scared you away, perhaps you
even have what it takes to become a statistician yourself.
This concludes the 3-part series "A
Tool That Counts: Basic Statistics for the Amateur Scientist."
Acknowledgments
The Citizen Scientist and Dr. Hartwig
are grateful to Richard D. McPeters and Arlin J. Krueger of
NASA's Goddard Space Flight Center, members of the TOMS Ozone
Processing Team, and the National Space Science Data Center
for providing the measurements of total ozone made by the
TOMS instrument aboard the Nimbus-7 satellite.
We are also happy to thank Robert Green of
the National Oceanic and Atmospheric Administration and the
Fresno-based Dobson instrument team for supplying additional
measurements of ozone. 
|