|
14 November 2003
Data Acquisition
with Perl - #1
by Joseph DiVerdi
What's Up
In this installment
we will turn to the problem of acquiring experimental data using a Perl
program. Actually, the problem isn't only acquiring the data but what
to do with it once it has been acquired. In this case, we will be examining
a special type of experimental data, that is, event capture.
An event can
be one of many observables including, but not limited to,
- a lever actuated by an experimental
subject,
- the detection of a radioactive
decay or cosmic ray by a Geiger-Mueller tube,
- the detection of a lightening
strike by a photo-detector or radio-detector,
- the passing of an automobile
on a road as detected by a pneumatic tube,
- one of my cats passing through
its door by breaking a light path,
- someone in a household turning
on the bathroom light or opening the refrigerator door,
and so on.
The range of possibilities is limited only by our experimental imagination.
However, the task is the same in this type of acquisition problem: record
when the actual event occurs. The program described here will perform
this task in a convenient way.
Equally important
is the means by which the occurrence of an event is declared to the computer.
In the method I am describing here I have chosen a scheme which makes
use of the serial port as the interface between the computer and the outside
world. A circuit was developed, using junk-box parts, which accepts a
logic signal and emits a single character in ASCII format using the RS-232
protocol. The logic signal is the falling edge of a CMOS electrical signal
but could just as easily be any of a number of other electrical signals.
The character emitted is fixed in the circuit and has no significance
- the important quantity is when the character occurs because that time
signals that the chosen event has taken place. The software program waits
and waits and waits until it is notified that the character has been received
at the serial port and then it snaps into action by recording the time
that it received the notification.
A Little Hardware
The schematic
of the logic translator circuit is shown in the first figure. It consists
of three ICs and runs on a single 5VDC power supply. It draws a little
over 10mA so it can draw its power easily from the computer supply.
|

Click
image to enlarge
|
The LM555
IC generates a square wave which serves as the baud rate generator; it
runs at sixteen times the desired baud rate. The IM6402 is a rather old
CMOS UART (Universal Asynchronous Receiver Transmitter) IC of which I
happen to have a handful. They are probably hard to obtain in this day
and age so if you're interested in building something like this be prepared
to improvise. Perhaps a more timely strategy is to use a PIC or STAMP
micro-controller to perform this function - they are inexpensive and plentiful
and easy to program. In any event (no pun intended) the UART generates
a bit stream corresponding to a single character whenever pin 23 is brought
to ground. The character is selected by the logic level of eight input
pins. I used a small 8-pole DIP switch to permit easy changes to the selected
character but have never used any value other than 000010102,
or 0127, or 1010, or 0A16, or <LF>,
or "\n" (dependi! ng upon your number system or language) which
is a "new line" character in UNIX parlance. The MAX236 IC is a CMOS to
RS-232 convertor which only requires single +5VDC power. It has its own
internal positive and negative DC power convertors which permit it to
deliver true RS-232 drive levels. An image of the convertor follows.
|

Click
image to enlarge
|
Looking at the Disk File Format
This is a
good time to describe exactly how the data will be organized on the disk
so that some of the programming logic will be more clear. There are a
few basic rules:
- All the data files will
be contained in a single disk directory.
- Each data file will correspond
to one calendar day's worth of events, irrespective of how many events
occur in that day.
- Each data file's name will
be the calendar date to which it corresponds. The file name will have
the format:
ccyy_mm_dd_UT.
ccyy -
two decimal digits of century & two decimal digits of year (no
Y3K problem here).
mm - two
decimal digits of month with leading zero padding.
dd - two
decimal digits of day with leading zero padding.
- The string "UT"
indicating that the date is in Universal Coordinated Time, basically
the successor to Greenwich Mean Time.
- Each of these fields
is separated by the underscore
_ character.
- Each data file's contents
consists of one or more "comment" lines containing meta-information
and one or more event records.
- Each event record exists
in the appropriate file as a single line containing the time of occurrence
of the event in the format:
ccyy.mm.dd hh:mm:ss UT.
- The first three fields
are in the same format as the file name.
hh - two
decimal digits of the hour, in twenty-four hour format with leading
zero padding.
mm - two
decimal digits of minute with leading zero padding.
ss - two
decimal digits of second with leading zero padding.
- The string "UT"
indicating that the date is in Universal Coordinated Time, basically
the successor to Greenwich Mean Time.
- Note that these fields
are separated by different characters.
These formats
may seem odd, convoluted, and generally perverse but they do serve to
organize the data in a form which is convenient, human-readable, compact
(while being human-readable), unique, and searchable. Since a new file
is written every day, a particular data file isn't open forever and separate
data mining or data reaping programs can go after a particular day's data
once without having to (re-)read other days' data. Since the data are
written as text, mere mortals can inspect the data without special utilities
or x-ray vision. Since the data have not been binned (beyond the one second
level) they can be analyzed and re-analyzed at any time over various time
windows. Here is a sample of the contents of a particular data file.
# Created with script version: 20011222
# file name: 2002.01.06_MT
2002.01.06 00:09:10 MT
2002.01.06 09:02:40 MT
2002.01.06 09:50:12 MT
A Little Software
Now let's
take a look at the Perl code which will suck in those characters emitted
by the hardware and write disk files containing the timing of the events.
#! /usr/bin/perl
# ----------------------------------------------------------------------------------
# particle_log, by Joseph A. DiVerdi
# Copyright 2001 by La Famiglia DiVerdi
# Copyright 2002, 2003 by XTR Systems, LLC
#
# Program to slurp in data from the particle counter, process it into a
# standardized format, and archive it to disk.
#
# Revision History:
# created 1 Dec 2001 JAD
# ----------------------------------------------------------------------------------
# Includes and other external modules
use warnings;
use strict;
use Carp;
# ----------------------------------------------------------------------------------
# main execution module
my $version = "20031111";
# data_directory_name must end with a slash
my $directory_name = "/home/diverdi/html/event_data/";
my $port_name = "/dev/ttyS1";
# open the serial port for read
open INPUT, "<", $port_name or
die "Can't open serial port '$port_name': $!\n";
# define this variable to prevent "strict" complaints but leave it undefined
my $file_name;
# look for a line of serial data terminated with a <NL>
while (<INPUT>) {
# save a copy of the current time which is when the event is received
my $current_time = time;
# check if the current file name is defined or if it is defined but it doesn't correspond to the current day
if (!defined $file_name or $file_name ne format_file_name($current_time)) {
# set up the now current log file name and full path
$file_name = format_file_name($current_time);
my $file_path = $directory_name . $file_name;
# issuing an open on DATA will automatically close an existing open data file
open DATA, ">>" . $file_path or
die "Can't open log file '$file_path': $!\n";
# change the permissions of the file to owner: read, write; group: read; world: none
chmod 0640, $file_path;
# set output flushing for the DATA file handle
# that is do NOT buffer disk data, write it to disk immediately
select DATA;
$| = 1;
# put the header in this data file if the file doesn't already exist
print DATA "# Created with script version: $version\n# file name: $file_name\n"
unless -s $file_path;
}
# put the time the event occurred in the data file in a human readable format
my @times = gmttime $current_time;
printf DATA "%04d.%02d.%02d %02d:%02d:%02d UT\n",
$times[5] + 1900, $times[4] + 1, $times[3], $times[2], $times[1], $times[0];
}
# ----------------------------------------------------------------------------------
sub format_file_name {
# convert the supplied argument, in Unix time, into a nicely formated string
# such as: 2003_11_02_UT
my @times = gmttime shift;
return sprintf "%04d.%02d.%02d_UT", $times[5] + 1900, $times[4] + 1, $times[3];
}
# ----------------------------------------------------------------------------------
As in the
previous code examples of this series, the beginning of the program conforms
to some requirements and to some good programming standards. The first
line tells the operating system that this is a Perl program and to execute
it as such. There's an abbreviated comment section describing the function
of the program and its heritage - it has been abbreviated for this publication
and should contain more explanatory detail in general. There are a few
"includes" which provide a standardized and somewhat rigorous programming
environment (I need all the help I can get to make me write better code).
The "Carp" module is a new one for us. It provides a more detailed set
of error messages which you'll appreciate when (not if) something goes
wrong.
The first
executable statements are variable assignment statements. The version
number is identified using a calendar date format and is always written
to the disk file contents so that the file's format can be traced to a
particular program version. The directory name specifies the directory
which will contain the various data files. The port name contains the
name of the UNIX serial device where the characters will be received.
You'll note that we open, close, read, and write to a device in exactly
the same fashion as we operate on disk files which is fundamental to the
UNIX philosophy.
The first
open statement connects the program to the serial port; the
leading "<" character signifies that this connection will
be for reading (as opposed to writing). The rest of this pair of statements
"open
or
die
" is a very popular
Perl idiom which deserves a little attention because of that popularity.
It is known in the programming world as "short cutting" the "or" statement.
You see, the open statement returns a value of true or false depending
upon whether it was able to open the device of not. This return value
is the first argument of the or boolean statement. Since
the result of an or statement is true when either
or both of its arguments is true, if the first argument is
true then there is no need to evaluate or execute the second
argument and it never is executed. If the first argument is false!
(because the device
couldn't be opened) then the first argument is false and
the second argument needs to be evaluated to return the result. The second
argument, however, is a die statement
which is actually the combination of an exit and
a print statement.
If it is executed then a message is reported and the program terminates
immediately.
The bulk of
the program is a while loop.
Note that the INPUT file
descriptor is the test argument. The neat feature of this construction
is that the execution stalls at this point until a character corresponding
to the end of a line (<NL>)
is received. So it just sits there, waiting for a character to arrive,
without consuming any computer resources until the character arrives when
the loop commences. After the loop contents are executed then the program
control returns to this point and it awaits a new character. The reason
for this behavior is beyond the scope of these articles but involves neato
programming principles such as signals and blocking
I/O.
The first
task inside the loop is to capture the current time, that is the time
when the event occurred. The time function
returns the current time as so-called UNIX time, that is the integer,
decimal number of seconds since the beginning of the UNIX epoch
which occurred on January 1, 1970 UT. This now very large number is a
very convenient way of keeping track of clock time but only because a
large number of functions are available to manipulate it. It is also important
to note that there are some limitations to the technique used here. First
of all, the timing of an event cannot be specified below the one second
level. There are high-resolution time functions which provide microsecond
resolutions but we'll save those for another time. Second, the basic,
out-of-the-box, run-of-the-mill UNIX is not real-time UNIX. Since UNIX
is a multi-tasking operating system it is possible that some other task
will not relinquish computer resources for some finite! time which can
distort our timing and make it appear as if a particular event occurred
later than it actually did. There are variants known as Real-Time
UNIX which can be made strictly deterministic but I avoid these
techniques by ensuring that the data acquisition computer is reserved
for data acquisition and no one plays any games on it. This problem is
addressed by throwing hardware at it.
The next series
of statements checks to see if a disk file exists which corresponds to
the same day as the current event, that disk file is open and ready for
writing, that the appropriate comment lines have been written to that
disk file, and that the current event information is immediately written
to that disk file without buffering, a technique used to make disk access
more efficient but a nuisance to us in this application.
Actual Experimental Data
I have used
this setup, along with a pair of Geiger-Mueller tubes and a bunch more
electronics to capture radioactive events and cosmic ray events. Since
this installment is already longer than I would like those experimental
data will appear in a separate article.
SAS Member Unix Accounts For
Learning Perl
If you're
an SAS member in good standing and are interested in trying out
some Perl programming but don't have access it, I'll be happy to spare
a few CPU cycles on one of my servers and provide an account. From this
account you can edit and run Perl programs but must not do evil
network things. The whole story is spelled out on the application
page. Read the rules of engagement, fill in the form, and I'll do
the rest (including checking with SAS HQ to see if you've been naughty
or nice). You'll receive your login information via email shortly thereafter.
Joseph DiVerdi is keeping
an eye on those pesky cats using technology. Contact him at diverdi@xtrsystems.com.
|