The Diptera Site
Research & Collections
horizontal bar
The Diptera Site

Collections
Research
vertical bar
Bar Codes for Specimen Data Management

(from Insect Collection News 1994, vol. 9, pages 2-4)

F. Christian Thompson
Systematic Entomology Laboratory, ARS,USDA
Washington, D. C. 20560



Systematic Entomology is built on massive collections of specimens and associated data. Where other disciplines have collections of a few thousand specimens, entomology collections typically contain millions. These numbers mean greater problems, but are the source of greater promise. Terrestrial arthropods provide more data points as there are more clades, species and individuals with longer histories and broader variation. Terrestrial arthropods are the glue that binds ecosystems together. So, for Society we need to manage the data associated with our entomological specimens efficiently and effectively, so we can benefit from information derived from them.

The world is changing, people are more interested in the environment, worried about climate change, loss of biodiversity, and other matters for which much of the scientific data are ultimately derived from museum specimens. Appreciation of this has lead to increased concern about, and unfortunately regulations for, biological specimens. Authorities are now demanding that accession history of a particular specimen be documented to ensure that each and every specimen was legally acquired (Lacey Act). Nations value their biodiversity and are granted legal rights to it by the Convention on Biodiversity. Some are, and more will demand that biodiversity information, if not the specimens from which it was derived, is repatriated. The impact of these matters will be great on entomology.

All biosystematic information is derived from specimens. Objective, scientific results require that observations can be repeated. So, for biosystematics, there is a need to tie the data derived from a specimen to that particular specimen. Traditionally this has meant that specimens have unique identifying numbers. Unfortunately, due to the high costs and the large number of specimens involved, entomologists have been reluctant to individually number specimens. Today for legalistic reasons alone entomologists must begin doing this. Bar-codes while still expensive allow for the identification of individual specimens and greatly reduce the cost of subsequent data handling. Many organizations have now begun bar-coding specimens as they are initially labelled (prospective data capture, see Thompson 1990). However, there remains a large backlog of existing specimens that do not have bar codes or other unique identifiers. So, as these existing specimens are handled in the course of research activities, they should be bar-coded so that the scientific observations can be easily verified (retrospective data capture, Thompson 1990).

The problem of prospective data capture has been solved by one collection (INBio; Janzen 1992). As new material is processed, bar-codes are attached as part of the labeling process. The data on locality, time, collectors, etc., are captured when the print order for the labels is generated. Now anyone working with INBio may get these data electronically and need not re-keyboard them. Some of us are working with INBio specimens and get these data on floppies when we borrow the material. However, INBio is on INTERNET and soon one should be able to get the data interactively. The INBio approach is fast becoming the standard. The University of Georgia has adopted it and the Bishop Museum is considering doing so. So this approach is recommended to the Entomological Community. Billions of keystrokes will be saved in the future by doing so now.

Gary Hevel estimates that about 100,000 specimens are labeled each year for the USNM. At this rate, the annual costs for bar code labels would be about $2,000. There are probably 60-100 characters per label. I estimate that I extracted label data from more than 4,000 specimens this past year for my research. Bar coding would have saved me a quarter million or more keystrokes. Multiply that by the number of scientists using USNM material and billions of keystrokes saved over the years is probably a conservative estimate.

The problem of retrospective data capture can be solved by using a similar approach. We (Entomological Collections Network; Thompson 1990) endorsed the view that retrospective data capture should be done as part of the research process. When researchers study previously collected specimens, they capture the specimen label data. Terry Erwin (and a few other scientists) has been doing this for years for his various projects (Erwin 1976). Each specimen that is handled, new or old, gets a unique ADP number that links the specimen to Terry's electronic data record. In the past, scientists linked specimens studied to their work with determination labels. Unfortunately, determination labels did not UNIQUELY identify a specimen with individual observations (these being, for example, a character state noted, measurement, etc.). Combining the Erwin ADP number idea, the traditional determination label and the INBio Bar code approach can generate a solution to the retrospective data capture problem. As the researcher captures specimen label data from old material (that is, material without bar codes), the researcher would affix a standard bar code. To make this work effective, the community and organizations must set standards and policy. Such standards and policy are outlined below with the resolution passed by ECN. The hardware needed to implement this approach is also briefly described.

Bar Codes for Entomology would consist of an unique ALPHAbetic identifier followed by a sequential number. The unique identifier is the key to the organization and/or person that captured the data. Community standards for such organizational identifiers exist and will be followed. USNM, for example, has been accepted as the standard acronym (abbreviation) for National Musueum of Natural History. This should be modified as USNM ENT to uniquely identify the entomological collections. The Systematic Entomology Laboratory is uniquely identified as USDA SEL. Terrestrial arthropods are small, so there isn't much "real estate" associated with a specimen to which to attach a bar code. Hence, for Entomology there are two important considerations for Bar Codes: That they be as SMALL as possible and that there be only ONE per specimen. The bar code known as Code 49 fulfills these required.

Organizations will have to accept the responsibility for specimen label databases, seeing that their data standards conform to community standards and that the data are accessible to all qualified users. At the moment, there are various data models and standards for specimen label data. Essentially these are all the same, allowing for storage of the basic data elements ALREADY mandated by our ADP Standards for Systematic Entomology (locality including coordinates, date, collector, and additional data as appropriate. see Thompson 1990).

Bar Code Scanner, bar code labels and associated computer hardware and software have been ordered for the Diptera Unit. Funds were provided by NMNH (Gomon). The Diptera unit will start both prospective and and retrospective data capture of flies. Building on this experience, we hope to expand the bar codes to all of the entomological collection. However, if one has a urgent project that would be faciliated by bar codes, please feel free to contact Wayne Mathis or Chris Thompson. Perhaps, your project can be worked into the pilot project.

Sources of bar codes and bar code scanning equipment. The smallest bar code in the public domain is Code 49. At the present only one company (INTERMEC) prints these bar codes and provides scanners able to read them. The approximate costs of the initial order of 150,000 labels is about $2,700, with subsequent orders some $500 less. The scanner and associated peripherals to attach it to either a MacIntosh or PC computer runs about $2,200. The scanner is attached between the keyboard and the computer, so it act merely as an extension of the keyboard. Check your local yellow pages for details on INTERMEC. If you can't find a local sales office or they need further information, then contact William McKenna, 3 Bala Plaza, Suite 117, Bala Cynwyd, PA 19004, (215) 668-2075).

References

Erwin, L. J. M. 1976. Application of a computerized general purpose information management system (SELGEM) to natural history research data bank (Coleoptera: Carabidae). Coleopt. Bull. 30: 1-32.

Janzen, D. H. 1992. Information on the bar code system that INBio uses in Costa Rica. Insect Collection News 7: 24

McGinley, R. 1993. Where's the management in collections management? Planning for improved care, greater use and growth of collections. Pp. 309-33. In Rose, C. L. et alia (eds), International Symposium and First World Congress on the preservation and conservation of Natural History Collections. Vol. 3, Madrid.

Thompson, F. C. (coordinator). 1990. Automatic Data Processing for Systematic Entomology: Promises and Problems. A report for the Entomological Collections Network. [48] pp. Washington


Entomological Collections Network

Bar Code Standard Resolution


Whereas Society is increasingly concerned with biological diversity and the sustainable use thereof;

Whereas Terrestrial arthropods provide the broadest and finest scale description of the biosphere as they provide more data points as there are more clades, species and individuals with longer histories and broader variation;

Whereas Terrestrial arthropods are the glue that binds ecosystems and therefore, the biosphere, together;

Whereas entomological collections contain the largest and most diverse sample of terrestrial arthropods and associated data;

Whereas entomological collections accept the responsibility to provide Society with the critical information for the understanding and sustainable use of biodiversity that their collections contain;

Whereas scientific information must be verifiable and therefore requires that specimens be uniquely identified;

Therefore the Entomological Collection Network adopts the following standard for the use of Bar Codes for the proper, effective and efficient management of specimens and their associated data.

1) a bar code will be an unique identifier that consists of a string of alphabetic characters that identifies the organization that created the associated data record followed by a sequential number;

2) as bar code labels need to be as small as possible so as not to take up too much space and must also encode sufficient data to uniquely identify specimens, code 49 uniform symbology will be used;

3) organizations will maintain computer files of specimen associated data that the bar codes uniquely identify, making the information available to users following the appropriate community standards (such as the ASC Database Policy);

4) organizations and individuals will respect bar code labels by leaving them attached, by not covering them with other labels, and by using existing bar codes, rather than adding new bar codes, so that only ONE bar code is used per specimen and that bar code is always clearly visible;

5) finally, organizations and individuals will provide the originator (the organization maintaining the computer files of associated data) of bar code with the scientific name and identifier, if so requested.

The above resolution was passed uniamously at the 1993 Annual Meeting of the Entomological Collections Network, 12 December 1993, Indianapolis.

Content by F. Christian Thompson
Please send questions and comments to Chris Thompson.
Last Updated: November 23, 2005 by Irina Brake