11 March 2005

A Tool That Counts: Basic Statistics for the Amateur Scientist

Part 1. Compiling and Sorting Your Data

Mark Hartwig, Ph.D.

Editor's Note: The original version of this series was first published as a major feature in Science Probe! (April 1992). The article received some nice reader feedback and college professors asked to reprint it. When Nature, one of the world's leading journals of science, reviewed Science Probe! in a feature about new science publications, this article was specifically cited by the reviewer. We are grateful to Dr. Hartwig for allowing The Citizen Scientist to present this series based on his article.

Although statistical analysis plays an important role in scientific research, many scientists-amateur and professional alike-are plagued by chronic “statophobia,” the fear of statistics. Symptoms of this tragic malady include visible agitation in the presence of Greek symbols and the avoidance of anything having to do with statistics. The natural consequence of such behavior is that most statophobes know far less about statistics than they should.

When presented properly, however, statistics is a fascinating subject that can open up new ways of looking at the natural world. Moreover, the basic principles are actually quite simple and can enhance anyone's critical thinking skills. The purpose of this article is to provide a gentle introduction to the wide world of statistics and to show you how to use statistical tools and principles to improve your understanding of the world around you.

Descriptive Statistics

For most of us, the word “statistics” evokes images of census forms and thick government reports. We think about traffic fatalities, violent crimes, disease, or what have you. And we hope that we don't become “just another statistic.”

These images convey an important fact about statistics: namely, that one of their most important functions is to describe data. Put another way, statistics provide us with methods that help us summarize, organize, and communicate the results of our experiments or observations.

This is where statistics can be particularly fun, especially if you're the curious type, because the methods of descriptive statistics allow you to take reams of observations and turn them into something understandable. They help you poke a round in your data and see what is there.

If you're a professional statistician, you're probably cringing at the suggestion that readers can use statistics to “poke around” in their data. Remember, though, that we're talking descriptive statistics, where poking around is perfectly acceptable. Once we get to inferential statistics, I'll be sure to mention the appropriate warnings.

Soon we'll take a look at some methods you can use to make sense out of a stack of data. Before we do, however, I should point out a possible source of confusion. As you might have guessed, the word “statistics” can be used in different ways. In addition to denoting a field of study, the word statistics is also the plural of statistic, a generic term that can refer to either a single numeric data point (e.g., the high temperature in Bangor, Maine, on June 16, 1988 ) or to a numeric figure estimated from several data points (e.g., the average high temperature in Bangor, Maine, for an entire year). Most of the time, the context will indicate the meaning of the word. In those places where it doesn't, I'll use the term statistical indices (plural of statistical index) whenever I'm talking about estimated numeric figures.

As Easy as 1, 2, 3…or Excel or Quattro Pro

The simplest descriptive statistical technique is one we use almost every day: counting. How many traffic accidents were there on U.S. highways in 1988? On how many days did Los Angles get any precipitation from 1980 to 2000? How many times did the city of Denver exceed the federal maximum standards for air pollution from 1990 to 2000? In each case, the answer is nothing more than a simple count. As simple as counting may be, however, it can be a useful tool for organizing your data.

For example, consider the data set in Table 1. These figures represent the total amount of ozone, in Dobson units, measured over Fresno , California , from February 11, 1990 (Day 1) to February 10, 1991 (Day365). Two sets of measurements are given: one from a ground based Dobson spectrophotometer instrument in Fresno and one from the Total Ozone Mapping Spectrometer (TOMS) aboard the Nimbus-7 satellite.

Organized the way they are in Table 1, the ozone measurements don't tell us much. They're just numbers. But let's do some counting. In particular, let's look at the satellite readings. If we were to count every occurrence of a satellite reading between 245 and 254 Dobson units, we would find three such occurrences. Similarly, if we were to count every satellite reading between 255 and 264 Dobson units, we would find 20. And if we continued our counting, we would eventually come up with the figures listed in Table 2, which is called the frequency table.

Notice, by the way, that we have not counted the occurrences of each distinct value. If we had done so, Table 2 would be much larger than it is. Moreover, the number of unique values is such that we would never find more than a few occurrences of any one value. In fact, for many of the values we would find only one or two occurrences-making it more difficult to pick out clean patterns in the data.

As an experiment, try counting the number of occurrences for each value. Or better yet, try working with different cut points. Instead of using 10-point intervals like we did here (e.g., 245-254, 255-264, and so on) try using three point intervals, or 20-point intervals, and see how your results differ from those presented here.

Note the pattern that emerges from our counting. The largest number of occurrences (74) falls between 295 and 304 Dobson units. The next largest numbers fall on either side, with 56 occurrences in the 305-314 range and 45 occurrences in the 285-294 range. The numbers then decline fairly steadily in either direction, with the exception of a blip in the 255-264 range (20 observations) and some minor fluctuations above 405 Dobson units.

The pattern becomes even clearer when we put the data from Table 2 into a histogram, or bar graph, such as the one shown in Fig. 1.

So even something as simple as counting can help us turn a welter of numbers into something that makes a little more sense. If we want to take things a step further, we could also look at not only the one-way frequency tables, but two-way and multi-way tables. We could also look at various procedures statisticians have developed for analyzing these tables, such as log-linear models, Cramer's V, Cohen.s. Kappa, the Phi coefficient, and so on. But for now, it's enough to know that counting can be a very useful tool in the amateur scientist's statistical bag of tricks.

Part 2 of this series appear in the next issue of The Citizen Scientist.

Day
TOMS
Ground
Day
TOMS
Ground
Day
TOMS
Ground
Day
TOMS
Ground
Day
TOMS
Ground
Day
TOMS
Ground
1
308
308
62
312
320
123
318
328
184
311
319
245
293
305
306
286
294
2
294
305
63
296
312
124
343
361
185
313
312
246
285
293
307
288
285
3
316
316
64
307
329
125
356
375
186
302
308
247
282
285
308
302
-
4
360
376
65
329
-
126
368
373
187
305
309
248
286
291
309
309
313
5
431
450
66
372
369
127
328
325
188
296
306
249
284
285
310
320
327
6
403
406
67
338
341
128
319
319
189
313
314
250
285
293
311
303
309
7
394
-
68
336
341
129
331
334
190
314
316
251
287
299
312
322
327
8
360
353
69
339
337
130
314
328
191
331
310
252
312
312
313
368
-
9
376
-
70
379
381
131
309
320
192
318
327
253
298
302
314
355
378
10
312
321
71
363
368
132
310
317
193
305
317
254
293
298
315
347
357
11
279
288
72
351
355
133
309
307
194
303
308
255
299
311
316
320
328
12
277
336
73
352
351
134
304
305
195
306
317
256
297
305
317
321
322
13
297
301
74
351
348
135
306
318
196
314
314
257
292
301
318
363
378
14
297
307
75
337
338
136
302
311
197
317
314
258
278
291
319
344
353
15
298
310
76
329
325
137
302
311
198
304
305
259
269
281
320
339
342
16
300
317
77
333
338
138
317
317
199
290
300
260
275
279
321
322
329
17
313
318
78
339
-
139
304
311
200
280
289
261
280
285
322
265
274
18
320
327
79
344
-
140
304
308
201
287
295
262
282
291
323
272
274
19
317
326
80
433
-
141
302
304
202
299
302
263
284
290
324
2-8
278
20
307
325
81
356
361
142
289
301
203
301
307
264
323
317
325
261
272
21
305
334
82
326
337
143
313
316
204
300
308
265
326
330
326
260
278
22
317
318
83
314
325
144
316
317
205
294
308
266
314
316
327
279
288
23
341
338
84
305
315
145
310
311
206
296
304
267
293
296
328
286
-
24
359
347
85
300
-
146
313
306
207
299
304
268
262
270
329
315
-
25
328
328
86
307
-
147
302
305
208
294
302
269
259
271
330
329
332
26
310
318
87
326
327
148
304
311
209
294
299
270
288
295
331
334
334
27
361
354
88
331
334
149
323
328
210
289
302
271
272
277
332
308
314
28
354
356
89
350
353
150
321
320
211
281
293
272
268
275
333
343
331
29
347
337
90
374
385
151
309
320
212
284
298
273
261
275
334
356
358
30
419
427
91
336
338
152
296
314
213
287
296
274
258
269
335
295
296
31
455
466
92
347
345
153
299
309
214
279
292
275
260
275
336
315
316
32
357
372
93
345
346
154
306
316
215
279
286
276
262
272
337
298
311
33
291
299
94
368
371
155
322
317
216
282
290
277
259
267
338
311
319
34
286
293
95
343
344
156
309
320
217
288
295
278
260
270
339
303
305
35
271
284
96
348
355
157
309
315
218
299
304
279
280
291
340
312
320
36
285
298
97
363
358
158
295
305
219
303
306
280
264
275
341
314
331
37
276
284
98
363
355
159
297
303
220
306
311
281
264
268
342
296
307
38
279
314
99
375
369
160
302
299
221
300
309
282
247
261
343
281
296
39
296
302
100
335
330
161
306
298
222
305
318
283
290
296
344
314
328
40
285
286
101
336
329
162
301
304
223
303
317
284
283
299
345
296
323
41
288
296
102
330
325
163
299
302
224
302
308
285
282
295
346
307
322
42
306
306
103
368
345
164
312
318
225
306
311
286
245
254
347
278
296
43
305
308
104
333
325
165
313
308
226
296
314
287
246
253
348
298
313
44
316
320
105
336
342
166
333
316
227
301
310
288
261
267
349
297
311
45
320
318
106
325
331
167
326
316
228
292
302
289
255
265
350
322
329
46
350
344
107
339
-
168
303
316
229
294
307
290
302
304
351
339
344
47
334
345
108
385
386
169
302
315
230
304
312
291
285
292
352
317
319
48
353
354
109
328
326
170
297
311
231
303
311
292
285
293
353
360
378
49
389
381
110
373
361
171
292
303
232
297
310
293
255
265
354
341
347
50
357
355
111
356
358
172
302
312
233
279
293
294
259
265
355
295
305
51
340
336
112
315
316
173
306
313
234
281
295
295
298
307
356
288
306
52
341
335
113
299
302
174
304
319
235
285
293
296
300
308
357
272
288
53
333
333
114
305
308
175
292
308
236
283
294
297
290
302
358
288
285
54
346
357
115
305
300
176
301
310
237
285
288
298
281
291
359
273
267
55
325
335
116
308
316
177
304
315
238
277
287
299
277
281
360
255
301
56
345
340
117
317
316
178
311
318
239
283
295
300
286
291
361
289
-
57
370
389
118
305
309
179
310
316
240
298
308
301
261
271
362
277
285
58
353
342
119
300
309
180
300
317
241
304
314
302
265
281
363
283
308
59
337
336
120
304
310
181
304
312
242
298
308
303
260
269
364
301
319
60
329
344
121
320
313
182
311
306
243
288
301
304
262
268
365
304
327
61
325
337
122
328
339
183
294
311
244
293
301
305
285
284

 

Table 1. Total ozone (from the ground to the top of the atmosphere) measured from 11 February 1990 to 10 February 1992 at Fresno, California. The TOMS data are ozone measurements by the TOMS instrument on the Nimbus 7 satellite. The ground data are from a Dobson spectrophotometer at Fresno, California. TOMS data are courtesy of the TOMS ozone processing team at the Goddard Space Flight Center (GSFC). Fresno data are courtesy of the National Oceanic and Atmospheric Administration.

 

Satellite
Number of
Ozone
Occurrences
245-254
3
255-264
20
265-274
9
275-284
33
285-294
45
295-304
74
305-314
56
315-324
28
325-334
25
335-344
22
345-354
15
355-364
16
365-374
8
375-384
3
385-394
3
395-404
1
405-414
0
415-424
1
425-434
2
435-444
0
445-454
0
455-464
1

Table 2. Frequency table of ozone measured by satellite for 11 February 1990 to 10 February 1992 at Fresno, California. Data courtesy of the TOMS ozone processing team at the Goddard Space Flight Center (GSFC).

   
Copyright 2005 by Society for Amateur Scientists