Bar Codes for Specimen Data Management
(from Insect Collection News 1994, vol. 9, pages 2-4)
F. Christian Thompson
Systematic Entomology Laboratory, ARS,USDA
Washington, D. C. 20560

Systematic Entomology is built on massive collections of specimens and
associated data. Where other disciplines have collections of a few thousand
specimens, entomology collections typically contain millions. These numbers
mean greater problems, but are the source of greater promise. Terrestrial
arthropods provide more data points as there are more clades, species and
individuals with longer histories and broader variation. Terrestrial arthropods
are the glue that binds ecosystems together. So, for Society we need to
manage the data associated with our entomological specimens efficiently
and effectively, so we can benefit from information derived from them.
The world is changing, people are more interested in the environment, worried
about climate change, loss of biodiversity, and other matters for which
much of the scientific data are ultimately derived from museum specimens.
Appreciation of this has lead to increased concern about, and unfortunately
regulations for, biological specimens. Authorities are now demanding that
accession history of a particular specimen be documented to ensure that
each and every specimen was legally acquired (Lacey Act). Nations value
their biodiversity and are granted legal rights to it by the Convention
on Biodiversity. Some are, and more will demand that biodiversity information,
if not the specimens from which it was derived, is repatriated. The impact
of these matters will be great on entomology.
All biosystematic information is derived from specimens. Objective, scientific
results require that observations can be repeated. So, for biosystematics,
there is a need to tie the data derived from a specimen to that particular
specimen. Traditionally this has meant that specimens have unique identifying
numbers. Unfortunately, due to the high costs and the large number of specimens
involved, entomologists have been reluctant to individually number specimens.
Today for legalistic reasons alone entomologists must begin doing this.
Bar-codes while still expensive allow for the identification of individual
specimens and greatly reduce the cost of subsequent data handling. Many
organizations have now begun bar-coding specimens as they are initially
labelled (prospective data capture, see Thompson 1990). However, there remains
a large backlog of existing specimens that do not have bar codes or other
unique identifiers. So, as these existing specimens are handled in the course
of research activities, they should be bar-coded so that the scientific
observations can be easily verified (retrospective data capture, Thompson
1990).
The problem of prospective data capture has been solved by one collection
(INBio; Janzen 1992). As new material is processed, bar-codes are attached
as part of the labeling process. The data on locality, time, collectors,
etc., are captured when the print order for the labels is generated. Now
anyone working with INBio may get these data electronically and need not
re-keyboard them. Some of us are working with INBio specimens and get these
data on floppies when we borrow the material. However, INBio is on INTERNET
and soon one should be able to get the data interactively. The INBio approach
is fast becoming the standard. The University of Georgia has adopted it
and the Bishop Museum is considering doing so. So this approach is recommended
to the Entomological Community. Billions of keystrokes will be saved in
the future by doing so now.
Gary Hevel estimates that about 100,000 specimens are labeled each year
for the USNM. At this rate, the annual costs for bar code labels would be
about $2,000. There are probably 60-100 characters per label. I estimate
that I extracted label data from more than 4,000 specimens this past year
for my research. Bar coding would have saved me a quarter million or more
keystrokes. Multiply that by the number of scientists using USNM material
and billions of keystrokes saved over the years is probably a conservative
estimate.
The problem of retrospective data capture can be solved by using a similar
approach. We (Entomological Collections Network; Thompson 1990) endorsed
the view that retrospective data capture should be done as part of the research
process. When researchers study previously collected specimens, they capture
the specimen label data. Terry Erwin (and a few other scientists) has been
doing this for years for his various projects (Erwin 1976). Each specimen
that is handled, new or old, gets a unique ADP number that links the specimen
to Terry's electronic data record. In the past, scientists linked specimens
studied to their work with determination labels. Unfortunately, determination
labels did not UNIQUELY identify a specimen with individual observations
(these being, for example, a character state noted, measurement, etc.).
Combining the Erwin ADP number idea, the traditional determination label
and the INBio Bar code approach can generate a solution to the retrospective
data capture problem. As the researcher captures specimen label data from
old material (that is, material without bar codes), the researcher would
affix a standard bar code. To make this work effective, the community and
organizations must set standards and policy. Such standards and policy are
outlined below with the resolution passed by ECN. The hardware needed to
implement this approach is also briefly described.
Bar Codes for Entomology would consist of an unique ALPHAbetic identifier
followed by a sequential number. The unique identifier is the key to the
organization and/or person that captured the data. Community standards for
such organizational identifiers exist and will be followed. USNM, for example,
has been accepted as the standard acronym (abbreviation) for National Musueum
of Natural History. This should be modified as USNM ENT to uniquely identify
the entomological collections. The Systematic Entomology Laboratory is uniquely
identified as USDA SEL. Terrestrial arthropods are small, so there isn't
much "real estate" associated with a specimen to which to attach
a bar code. Hence, for Entomology there are two important considerations
for Bar Codes: That they be as SMALL as possible and that there be only
ONE per specimen. The bar code known as Code 49 fulfills these required.
Organizations will have to accept the responsibility for specimen label
databases, seeing that their data standards conform to community standards
and that the data are accessible to all qualified users. At the moment,
there are various data models and standards for specimen label data. Essentially
these are all the same, allowing for storage of the basic data elements
ALREADY mandated by our ADP Standards for Systematic Entomology (locality
including coordinates, date, collector, and additional data as appropriate.
see Thompson 1990).
Bar Code Scanner, bar code labels and associated computer hardware and software
have been ordered for the Diptera Unit. Funds were provided by NMNH (Gomon).
The Diptera unit will start both prospective and and retrospective data
capture of flies. Building on this experience, we hope to expand the bar
codes to all of the entomological collection. However, if one has a urgent
project that would be faciliated by bar codes, please feel free to contact
Wayne Mathis or Chris Thompson. Perhaps, your project can be worked into
the pilot project.
Sources of bar codes and bar code scanning equipment. The smallest bar code
in the public domain is Code 49. At the present only one company (INTERMEC)
prints these bar codes and provides scanners able to read them. The approximate
costs of the initial order of 150,000 labels is about $2,700, with subsequent
orders some $500 less. The scanner and associated peripherals to attach
it to either a MacIntosh or PC computer runs about $2,200. The scanner is
attached between the keyboard and the computer, so it act merely as an extension
of the keyboard. Check your local yellow pages for details on INTERMEC.
If you can't find a local sales office or they need further information,
then contact William McKenna, 3 Bala Plaza, Suite 117, Bala Cynwyd, PA 19004,
(215) 668-2075).
References
Erwin, L. J. M. 1976. Application of a computerized general purpose information management system (SELGEM) to natural history research data bank (Coleoptera: Carabidae). Coleopt. Bull. 30: 1-32.
Janzen, D. H. 1992. Information on the bar code system that INBio uses in Costa Rica.
Insect Collection News 7: 24
McGinley, R. 1993. Where's the management in collections management? Planning for improved care, greater use and growth of collections. Pp. 309-33. In Rose, C. L. et alia (eds), International Symposium and First World Congress on the preservation and conservation of Natural History Collections. Vol. 3, Madrid.
Thompson, F. C. (coordinator). 1990. Automatic Data Processing for Systematic Entomology: Promises and Problems. A report for the Entomological Collections Network. [48] pp. Washington
Entomological Collections Network
Bar Code Standard Resolution
Whereas Society is increasingly concerned with biological diversity and
the sustainable use thereof;
Whereas Terrestrial arthropods provide the broadest and finest scale description
of the biosphere as they provide more data points as there are more clades,
species and individuals with longer histories and broader variation;
Whereas Terrestrial arthropods are the glue that binds ecosystems and therefore,
the biosphere, together;
Whereas entomological collections contain the largest and most diverse sample
of terrestrial arthropods and associated data;
Whereas entomological collections accept the responsibility to provide Society
with the critical information for the understanding and sustainable use
of biodiversity that their collections contain;
Whereas scientific information must be verifiable and therefore requires
that specimens be uniquely identified;
Therefore the Entomological Collection Network adopts the following standard
for the use of Bar Codes for the proper, effective and efficient management
of specimens and their associated data.
1) a bar code will be an unique identifier that consists of a string of
alphabetic characters that identifies the organization that created the
associated data record followed by a sequential number;
2) as bar code labels need to be as small as possible so as not to take
up too much space and must also encode sufficient data to uniquely identify
specimens, code 49 uniform symbology will be used;
3) organizations will maintain computer files of specimen associated data
that the bar codes uniquely identify, making the information available to
users following the appropriate community standards (such as the ASC Database
Policy);
4) organizations and individuals will respect bar code labels by leaving
them attached, by not covering them with other labels, and by using existing
bar codes, rather than adding new bar codes, so that only ONE bar code is
used per specimen and that bar code is always clearly visible;
5) finally, organizations and individuals will provide the originator (the
organization maintaining the computer files of associated data) of bar code
with the scientific name and identifier, if so requested.
The above resolution was passed uniamously at the 1993 Annual Meeting of
the Entomological Collections Network, 12 December 1993, Indianapolis.
|