A Tool That Counts: Basic Statistics
for the Amateur Scientist
Part 1. Compiling and Sorting Your
Data
Mark Hartwig, Ph.D.
Editor's Note: The original version of
this series was first published as a major feature in Science
Probe! (April 1992). The article received some nice reader
feedback and college professors asked to reprint it. When
Nature, one of the world's leading journals of science,
reviewed Science Probe! in a feature about new science
publications, this article was specifically cited by the reviewer.
We are grateful to Dr. Hartwig for allowing The Citizen
Scientist to present this series based on his article.
Although statistical analysis plays an important
role in scientific research, many scientists-amateur and professional
alike-are plagued by chronic “statophobia,” the fear of statistics.
Symptoms of this tragic malady include visible agitation in
the presence of Greek symbols and the avoidance of anything
having to do with statistics. The natural consequence of such
behavior is that most statophobes know far less about statistics
than they should.
When presented properly, however, statistics
is a fascinating subject that can open up new ways of looking
at the natural world. Moreover, the basic principles are actually
quite simple and can enhance anyone's critical thinking skills.
The purpose of this article is to provide a gentle introduction
to the wide world of statistics and to show you how to use
statistical tools and principles to improve your understanding
of the world around you.
Descriptive Statistics
For most of us, the word “statistics” evokes
images of census forms and thick government reports. We think
about traffic fatalities, violent crimes, disease, or what
have you. And we hope that we don't become “just another statistic.”
These images convey an important fact about
statistics: namely, that one of their most important functions
is to describe data. Put another way, statistics provide us
with methods that help us summarize, organize, and communicate
the results of our experiments or observations.
This is where statistics can be particularly
fun, especially if you're the curious type, because the methods
of descriptive statistics allow you to take reams of observations
and turn them into something understandable. They help you
poke a round in your data and see what is there.
If you're a professional statistician, you're
probably cringing at the suggestion that readers can use statistics
to “poke around” in their data. Remember, though, that we're
talking descriptive statistics, where poking around is perfectly
acceptable. Once we get to inferential statistics, I'll be
sure to mention the appropriate warnings.
Soon we'll take a look at some methods you
can use to make sense out of a stack of data. Before we do,
however, I should point out a possible source of confusion.
As you might have guessed, the word “statistics” can be used
in different ways. In addition to denoting a field of study,
the word statistics is also the plural of statistic, a generic
term that can refer to either a single numeric data point
(e.g., the high temperature in Bangor, Maine, on June 16,
1988 ) or to a numeric figure estimated from several data
points (e.g., the average high temperature in Bangor, Maine,
for an entire year). Most of the time, the context will indicate
the meaning of the word. In those places where it doesn't,
I'll use the term statistical indices (plural of statistical
index) whenever I'm talking about estimated numeric figures.
As Easy as 1, 2, 3…or Excel or Quattro
Pro
The simplest descriptive statistical technique
is one we use almost every day: counting. How many traffic
accidents were there on U.S. highways in 1988? On how many
days did Los Angles get any precipitation from 1980 to 2000?
How many times did the city of Denver exceed the federal maximum
standards for air pollution from 1990 to 2000? In each case,
the answer is nothing more than a simple count. As simple
as counting may be, however, it can be a useful tool for organizing
your data.
For example, consider the data set in Table
1. These figures represent the total amount of ozone, in Dobson
units, measured over Fresno , California , from February 11,
1990 (Day 1) to February 10, 1991 (Day365). Two sets of measurements
are given: one from a ground based Dobson spectrophotometer
instrument in Fresno and one from the Total Ozone Mapping
Spectrometer (TOMS) aboard the Nimbus-7 satellite.
Organized the way they are in Table 1, the
ozone measurements don't tell us much. They're just numbers.
But let's do some counting. In particular, let's look at the
satellite readings. If we were to count every occurrence of
a satellite reading between 245 and 254 Dobson units, we would
find three such occurrences. Similarly, if we were to count
every satellite reading between 255 and 264 Dobson units,
we would find 20. And if we continued our counting, we would
eventually come up with the figures listed in Table 2, which
is called the frequency table.
Notice, by the way, that we have not counted
the occurrences of each distinct value. If we had done so,
Table 2 would be much larger than it is. Moreover, the number
of unique values is such that we would never find more than
a few occurrences of any one value. In fact, for many of the
values we would find only one or two occurrences-making it
more difficult to pick out clean patterns in the data.
As an experiment, try counting the number
of occurrences for each value. Or better yet, try working
with different cut points. Instead of using 10-point intervals
like we did here (e.g., 245-254, 255-264, and so on) try using
three point intervals, or 20-point intervals, and see how
your results differ from those presented here.
Note the pattern that emerges from our counting.
The largest number of occurrences (74) falls between 295 and
304 Dobson units. The next largest numbers fall on either
side, with 56 occurrences in the 305-314 range and 45 occurrences
in the 285-294 range. The numbers then decline fairly steadily
in either direction, with the exception of a blip in the 255-264
range (20 observations) and some minor fluctuations above
405 Dobson units.
The pattern becomes even clearer when we
put the data from Table 2 into a histogram, or bar graph,
such as the one shown in Fig. 1.
So even something as simple as counting can
help us turn a welter of numbers into something that makes
a little more sense. If we want to take things a step further,
we could also look at not only the one-way frequency tables,
but two-way and multi-way tables. We could also look at various
procedures statisticians have developed for analyzing these
tables, such as log-linear models, Cramer's V, Cohen.s. Kappa,
the Phi coefficient, and so on. But for now, it's enough to
know that counting can be a very useful tool in the amateur
scientist's statistical bag of tricks.
Part 2 of this series appear in the next issue of The
Citizen Scientist.
Day
|
TOMS
|
Ground
|
Day
|
TOMS
|
Ground
|
Day
|
TOMS
|
Ground
|
Day
|
TOMS
|
Ground
|
Day
|
TOMS
|
Ground
|
Day
|
TOMS
|
Ground
|
1 |
308 |
308 |
62 |
312 |
320 |
123 |
318 |
328 |
184 |
311 |
319 |
245 |
293 |
305 |
306 |
286 |
294 |
2 |
294 |
305 |
63 |
296 |
312 |
124 |
343 |
361 |
185 |
313 |
312 |
246 |
285 |
293 |
307 |
288 |
285 |
3 |
316 |
316 |
64 |
307 |
329 |
125 |
356 |
375 |
186 |
302 |
308 |
247 |
282 |
285 |
308 |
302 |
- |
4 |
360 |
376 |
65 |
329 |
- |
126 |
368 |
373 |
187 |
305 |
309 |
248 |
286 |
291 |
309 |
309 |
313 |
5 |
431 |
450 |
66 |
372 |
369 |
127 |
328 |
325 |
188 |
296 |
306 |
249 |
284 |
285 |
310 |
320 |
327 |
6 |
403 |
406 |
67 |
338 |
341 |
128 |
319 |
319 |
189 |
313 |
314 |
250 |
285 |
293 |
311 |
303 |
309 |
7 |
394 |
- |
68 |
336 |
341 |
129 |
331 |
334 |
190 |
314 |
316 |
251 |
287 |
299 |
312 |
322 |
327 |
8 |
360 |
353 |
69 |
339 |
337 |
130 |
314 |
328 |
191 |
331 |
310 |
252 |
312 |
312 |
313 |
368 |
- |
9 |
376 |
- |
70 |
379 |
381 |
131 |
309 |
320 |
192 |
318 |
327 |
253 |
298 |
302 |
314 |
355 |
378 |
10 |
312 |
321 |
71 |
363 |
368 |
132 |
310 |
317 |
193 |
305 |
317 |
254 |
293 |
298 |
315 |
347 |
357 |
11 |
279 |
288 |
72 |
351 |
355 |
133 |
309 |
307 |
194 |
303 |
308 |
255 |
299 |
311 |
316 |
320 |
328 |
12 |
277 |
336 |
73 |
352 |
351 |
134 |
304 |
305 |
195 |
306 |
317 |
256 |
297 |
305 |
317 |
321 |
322 |
13 |
297 |
301 |
74 |
351 |
348 |
135 |
306 |
318 |
196 |
314 |
314 |
257 |
292 |
301 |
318 |
363 |
378 |
14 |
297 |
307 |
75 |
337 |
338 |
136 |
302 |
311 |
197 |
317 |
314 |
258 |
278 |
291 |
319 |
344 |
353 |
15 |
298 |
310 |
76 |
329 |
325 |
137 |
302 |
311 |
198 |
304 |
305 |
259 |
269 |
281 |
320 |
339 |
342 |
16 |
300 |
317 |
77 |
333 |
338 |
138 |
317 |
317 |
199 |
290 |
300 |
260 |
275 |
279 |
321 |
322 |
329 |
17 |
313 |
318 |
78 |
339 |
- |
139 |
304 |
311 |
200 |
280 |
289 |
261 |
280 |
285 |
322 |
265 |
274 |
18 |
320 |
327 |
79 |
344 |
- |
140 |
304 |
308 |
201 |
287 |
295 |
262 |
282 |
291 |
323 |
272 |
274 |
19 |
317 |
326 |
80 |
433 |
- |
141 |
302 |
304 |
202 |
299 |
302 |
263 |
284 |
290 |
324 |
2-8 |
278 |
20 |
307 |
325 |
81 |
356 |
361 |
142 |
289 |
301 |
203 |
301 |
307 |
264 |
323 |
317 |
325 |
261 |
272 |
21 |
305 |
334 |
82 |
326 |
337 |
143 |
313 |
316 |
204 |
300 |
308 |
265 |
326 |
330 |
326 |
260 |
278 |
22 |
317 |
318 |
83 |
314 |
325 |
144 |
316 |
317 |
205 |
294 |
308 |
266 |
314 |
316 |
327 |
279 |
288 |
23 |
341 |
338 |
84 |
305 |
315 |
145 |
310 |
311 |
206 |
296 |
304 |
267 |
293 |
296 |
328 |
286 |
- |
24 |
359 |
347 |
85 |
300 |
- |
146 |
313 |
306 |
207 |
299 |
304 |
268 |
262 |
270 |
329 |
315 |
- |
25 |
328 |
328 |
86 |
307 |
- |
147 |
302 |
305 |
208 |
294 |
302 |
269 |
259 |
271 |
330 |
329 |
332 |
26 |
310 |
318 |
87 |
326 |
327 |
148 |
304 |
311 |
209 |
294 |
299 |
270 |
288 |
295 |
331 |
334 |
334 |
27 |
361 |
354 |
88 |
331 |
334 |
149 |
323 |
328 |
210 |
289 |
302 |
271 |
272 |
277 |
332 |
308 |
314 |
28 |
354 |
356 |
89 |
350 |
353 |
150 |
321 |
320 |
211 |
281 |
293 |
272 |
268 |
275 |
333 |
343 |
331 |
29 |
347 |
337 |
90 |
374 |
385 |
151 |
309 |
320 |
212 |
284 |
298 |
273 |
261 |
275 |
334 |
356 |
358 |
30 |
419 |
427 |
91 |
336 |
338 |
152 |
296 |
314 |
213 |
287 |
296 |
274 |
258 |
269 |
335 |
295 |
296 |
31 |
455 |
466 |
92 |
347 |
345 |
153 |
299 |
309 |
214 |
279 |
292 |
275 |
260 |
275 |
336 |
315 |
316 |
32 |
357 |
372 |
93 |
345 |
346 |
154 |
306 |
316 |
215 |
279 |
286 |
276 |
262 |
272 |
337 |
298 |
311 |
33 |
291 |
299 |
94 |
368 |
371 |
155 |
322 |
317 |
216 |
282 |
290 |
277 |
259 |
267 |
338 |
311 |
319 |
34 |
286 |
293 |
95 |
343 |
344 |
156 |
309 |
320 |
217 |
288 |
295 |
278 |
260 |
270 |
339 |
303 |
305 |
35 |
271 |
284 |
96 |
348 |
355 |
157 |
309 |
315 |
218 |
299 |
304 |
279 |
280 |
291 |
340 |
312 |
320 |
36 |
285 |
298 |
97 |
363 |
358 |
158 |
295 |
305 |
219 |
303 |
306 |
280 |
264 |
275 |
341 |
314 |
331 |
37 |
276 |
284 |
98 |
363 |
355 |
159 |
297 |
303 |
220 |
306 |
311 |
281 |
264 |
268 |
342 |
296 |
307 |
38 |
279 |
314 |
99 |
375 |
369 |
160 |
302 |
299 |
221 |
300 |
309 |
282 |
247 |
261 |
343 |
281 |
296 |
39 |
296 |
302 |
100 |
335 |
330 |
161 |
306 |
298 |
222 |
305 |
318 |
283 |
290 |
296 |
344 |
314 |
328 |
40 |
285 |
286 |
101 |
336 |
329 |
162 |
301 |
304 |
223 |
303 |
317 |
284 |
283 |
299 |
345 |
296 |
323 |
41 |
288 |
296 |
102 |
330 |
325 |
163 |
299 |
302 |
224 |
302 |
308 |
285 |
282 |
295 |
346 |
307 |
322 |
42 |
306 |
306 |
103 |
368 |
345 |
164 |
312 |
318 |
225 |
306 |
311 |
286 |
245 |
254 |
347 |
278 |
296 |
43 |
305 |
308 |
104 |
333 |
325 |
165 |
313 |
308 |
226 |
296 |
314 |
287 |
246 |
253 |
348 |
298 |
313 |
44 |
316 |
320 |
105 |
336 |
342 |
166 |
333 |
316 |
227 |
301 |
310 |
288 |
261 |
267 |
349 |
297 |
311 |
45 |
320 |
318 |
106 |
325 |
331 |
167 |
326 |
316 |
228 |
292 |
302 |
289 |
255 |
265 |
350 |
322 |
329 |
46 |
350 |
344 |
107 |
339 |
- |
168 |
303 |
316 |
229 |
294 |
307 |
290 |
302 |
304 |
351 |
339 |
344 |
47 |
334 |
345 |
108 |
385 |
386 |
169 |
302 |
315 |
230 |
304 |
312 |
291 |
285 |
292 |
352 |
317 |
319 |
48 |
353 |
354 |
109 |
328 |
326 |
170 |
297 |
311 |
231 |
303 |
311 |
292 |
285 |
293 |
353 |
360 |
378 |
49 |
389 |
381 |
110 |
373 |
361 |
171 |
292 |
303 |
232 |
297 |
310 |
293 |
255 |
265 |
354 |
341 |
347 |
50 |
357 |
355 |
111 |
356 |
358 |
172 |
302 |
312 |
233 |
279 |
293 |
294 |
259 |
265 |
355 |
295 |
305 |
51 |
340 |
336 |
112 |
315 |
316 |
173 |
306 |
313 |
234 |
281 |
295 |
295 |
298 |
307 |
356 |
288 |
306 |
52 |
341 |
335 |
113 |
299 |
302 |
174 |
304 |
319 |
235 |
285 |
293 |
296 |
300 |
308 |
357 |
272 |
288 |
53 |
333 |
333 |
114 |
305 |
308 |
175 |
292 |
308 |
236 |
283 |
294 |
297 |
290 |
302 |
358 |
288 |
285 |
54 |
346 |
357 |
115 |
305 |
300 |
176 |
301 |
310 |
237 |
285 |
288 |
298 |
281 |
291 |
359 |
273 |
267 |
55 |
325 |
335 |
116 |
308 |
316 |
177 |
304 |
315 |
238 |
277 |
287 |
299 |
277 |
281 |
360 |
255 |
301 |
56 |
345 |
340 |
117 |
317 |
316 |
178 |
311 |
318 |
239 |
283 |
295 |
300 |
286 |
291 |
361 |
289 |
- |
57 |
370 |
389 |
118 |
305 |
309 |
179 |
310 |
316 |
240 |
298 |
308 |
301 |
261 |
271 |
362 |
277 |
285 |
58 |
353 |
342 |
119 |
300 |
309 |
180 |
300 |
317 |
241 |
304 |
314 |
302 |
265 |
281 |
363 |
283 |
308 |
59 |
337 |
336 |
120 |
304 |
310 |
181 |
304 |
312 |
242 |
298 |
308 |
303 |
260 |
269 |
364 |
301 |
319 |
60 |
329 |
344 |
121 |
320 |
313 |
182 |
311 |
306 |
243 |
288 |
301 |
304 |
262 |
268 |
365 |
304 |
327 |
61 |
325 |
337 |
122 |
328 |
339 |
183 |
294 |
311 |
244 |
293 |
301 |
305 |
285 |
284 |
|
|
|
Table 1. Total ozone (from the ground to
the top of the atmosphere) measured from 11 February 1990
to 10 February 1992 at Fresno, California. The TOMS data are
ozone measurements by the TOMS instrument on the Nimbus 7
satellite. The ground data are from a Dobson spectrophotometer
at Fresno, California. TOMS data are courtesy of the TOMS
ozone processing team at the Goddard Space Flight Center (GSFC).
Fresno data are courtesy of the National Oceanic and Atmospheric
Administration.
Satellite
|
Number of
|
Ozone |
Occurrences |
245-254 |
3 |
255-264 |
20 |
265-274 |
9 |
275-284 |
33 |
285-294 |
45 |
295-304 |
74 |
305-314 |
56 |
315-324 |
28 |
325-334 |
25 |
335-344 |
22 |
345-354 |
15 |
355-364 |
16 |
365-374 |
8 |
375-384 |
3 |
385-394 |
3 |
395-404 |
1 |
405-414 |
0 |
415-424 |
1 |
425-434 |
2 |
435-444 |
0 |
445-454 |
0 |
455-464 |
1 |
Table 2. Frequency table of ozone measured
by satellite for 11 February 1990 to 10 February 1992 at Fresno,
California. Data courtesy of the TOMS ozone processing team
at the Goddard Space Flight Center (GSFC).

|