Skip to end of metadata
Go to start of metadata

Project Home

Most of these will be constructed on the CollectionSpace project wiki.  While this page has an explicit focus on the University and Jepson Herbaria at UC Berkeley and the current system in use, SMaSCH, inevitably broader natural history collections needs have emerged and been folded in.  We will need to confirm with Herbaria staff that these findings are accurate.

Not all use cases are covered here, primarily where there is already a clear fit between CollectionSpace and the Herbaria's needs.  For example, initial analysis suggests that basic object entry, acquisition and cataloguing are well covered in the CollectionSpace model for herbaria specimens.  In such cases, only questions or possible gaps are pulled out and defined.  So for example, the Herbaria's workflows and data capturing requirements around exsiccata and folios need further consideration.

Collection Objects and Cataloguing

Status: See the Herbaria use cases in the CollectionSpace wiki.

  • Exsiccata and folios: The Herbaria has some special needs around entities such as exsiccata and perhaps folios. As noted by Lam:

This was on Dick's desiderata, and he is the best person to answer this question, as my understanding is limited.  In SMASCH, it is the case that most accessions that have been entered only have one sheet.  Therefore, in that case, the collection object = the accession = the specimen sheet.  In the few cases of accessions with multiple sheets, the workaround is to enter each sheet as an accession record with a suffix, e.g. UC391612A, UC391612B.  The schema for SMASCH does not handle bound materials; each sheet would have to be entered as a separate accession record, and there's no way to keep track of what accessions are bundled together (in the current model for SMASCH).  In Arctos, this would be handled using lot count or parts, depending on what kind of data needed to be captured.  It's similar to what Specify has, but it is generic so that the same concept works across disciplines.

  • Containers and movement control: We should develop a use case around containers in a laboratory.  In the Arctos presentation, Gordon Jarrell talked about the way that Arctos facilitated tracking of tissue movement in the laboratory (e.g., move whole tray of tissues, each in a vial, from one refrigerator to another laboratory). This is an example of batch update of location in the movement control module. See the Arctos definition of Containers.

For some general requirements regarding natural history collections, see the Arctos Help page for Definitions and Standards for great information such as:


Use case(s) will document how the Herbaria sets up committees (individuals or groups) for identifying who collects, who annotates, and who identifies.  Note: A committee must exist before an accession is entered.

See the Arctos definitions for Agents.  See Verbatim Collector as an interesting use case.


Status: See the Herbaria use cases in the CollectionSpace wiki.

Note: In CollectionSpace, we can track work on the Vocabulary Service and Vocabulary Stories

While SMaSCH does not model taxonomic information as completely as is desired, one or more use cases should be developed that identify how taxonomic information is managed and used, how common names are mapped, how taxonomic name ordering is important, and so on.  The design goal will be to see if taxonomies are qualitatively different from the controlled vocabularies use cases being modeled for CollectionSpace.  Here are some use cases to be developed:

  • Collection management system stores taxonomy locally.
  • Collection management system refers to taxonomy information stored in a repository such as uBio or GlobalNames, employing services offered by those repositories.
  • Collection management system relies on accepted terms and relationships from an external provider for 99% of its specimens and determinations.  However, it employs local terms and identifications for specializations and where local scientists are performing research that might lead to refining the domain-specific taxonomy.
  • Taxonomic ordering: The order in which taxonomies present terms at different ranks (levels in the hierarchy) is important and not necessarily alphabetical.
  • When a specimen is identified (or re-identified) as belonging to a taxa (at any level in the hierarchical tree), the "object name" effectively becomes that taxa name (in whichever form of name is appropriate, e.g., the binomial name such as Homo sapiens, or the appropriate identification - see Taxonomic Identification below.
    When the curator or museum scientist is entering the object, s/he can search or browse the hierarchial taxonomic tree, expanding and collapsing levels until the appropriate identifier is located; object is associated with that taxa; user can annotate the identification with additional information.
  • When the user is changing the identification, the previous state is maintained somewhere.
    In this domain, the identification needs to managed seprately from the taxon ID or code.  That is, it's likely that the identification will need to be treated as a separate association between object and taxa (ID) given the amount of change that happens here.  (This needs elaboration by Susan or Lam).
  • When searching for the appropriate taxon, the system must handle relationships of different terms, e.g., synonyms and common names, but it must be clear what is considered the accepted or vaid term.
    Batch updates of identifications and changes in classification should be facilitated, ideally at the user interface level though this may be impractical and would require specific permissions.  E.g., Jim Patton wants to update some of the higher classification of mammals. He wants to move all the New World taxa now in the family Muridae to Cricetidae.
  • User (whether museum scientist or the public user) should be able to see the hierarchial taxonomic tree while viewing the specimen
  • User should be able to browse or search the tree in order to find or explore the collection.  E.g., I'm looking at a specimen of this family-genus-species.  Now show me a list of all specimens in that same family.
  • Multiple classifications: Is it a requirement that multiple versions of the tree are needed in order to maintain history?

Here are some service-based use cases that were identified by John Wiezscorek (MVZ) in February 2007 in a discussion about a name resolver service:

  • Get all synonyms for name x at same rank as x
  • Get accepted valid name for name x
  • Get all accepted valid names for children of rank y for name x
  • Get accepted valid parent name for name x at rank y
  • Get full accepted valid name hierarchy of name x starting at the same rank as x

Taxonomic Identification

Status: See the Herbaria use cases in the CollectionSpace wiki.

The assignment of taxonomic identity to a specimen could be (and probably should be) broken out as a separate service with the following capabilities:

  • a specimen with multiple identifications/determinations (e.g. based on field id, expert id, dna analysis)
  • an identification/determination with multiple scientific names (e.g. A X B, A or B, A and B) (not named hybrids)
  • an identification/determination that is a temporary/unpublished/working name (e.g. Bolitoglossa sp. nov. weird feet, Liolaemus sp. nov. neo G)
  • an identification/determination with modifiers (e.g., cf., aff., sp.)
  • it's fairly common for specimens to only have a "generic" id (e.g., identified only to family)
  • it is possible for a specimen to have multiple "current" determinations -- from the collector, from another scientist studying the specimen, from a DNA sequencing lab.  The convention (someone please confirm or update!) is that the collector is the system of record for the scientific determination.

Here is some additional schema information regarding an identification service:

  • specimen id number - id number of the specimen being identified
  • determination agent id -  id number of the person who made the determination (may include order if determined by group)
  • determination date - date determination was made
  • determination type - qualifies determination (e.g. field id, molecular data, ID of kin)
  • accepted determination flag - flags "accepted" identification
  • scientific name - scientific name for the identification, including modifiers, etc.
  • taxon id - id number of taxon name(s) used for the identification
  • determination remarks - additional notes/remarks

Lam wrote:

Smasch handles the above except for determination agent, date, and type.  there are fields to keep track of modification date and modification agent, but not the identification authority.  As I recall from the demo, identification/taxonomy is handled similarly in Specify and SMASCH.  Arctos handles the above and also uses taxa formulae for creating the scientific names for the determination.

Research and system integration for biodiversity sciences

Status: In development.  See the Herbaria use cases in the CollectionSpace wiki .

There are host of these that need to be developed.

  • Provide link to image of object stored in separate image dissemination system (e.g., CalPhotos)
  • Provide link to separate system that records protein sequences about a particular system (e.g., GenBank)
  • Display images stored in separate image dissemination system (e.g., CalPhotos)
  • Allow other systems to link to specific object record in CollectionSpace.  E.g., CalPhotos or GenBank can have a link that pulls up CollectionSpace record for a specific specimen.
  • Allow other systems to query object records in CollectionSpace by scientific name.  E.g., CalPhotos, ITIS, Encyclopedia of Life could query a CollectionSpace instance for all specimens whose scientific name is 'Abronia umbellata ssp. breviflora'
  • Integration with a field information management system.  E.g., the Moorea-Biocode project is developing a field collecting system (FIMS) that allows scientists from different research teams to collect specimens, take images, identify geolocations and collectors.  That information is currently gathered using a variety of means (Excel spreadsheets, digital camera image metadata, GPS) and goes through an import process into the Biocode database.  There are complications of course: The different kinds of information from one collecting event can come in separately.  Some information might need updating.  Field scientists are working in conditions that are challenging in many ways!
  • Integration with downstream laboratory information management system (LIMS for tissue sampling and genetic data).  E.g., the Moorea-Biocode project is partnering with Biomatters to integrate the Biocode specimen and field data with Geneious, a set of DNA and protein sequence analysis tools.  The process of creating tissue samples for laboratory analysis is managed in the Biocode database.  Matching up plates of tissues to specimens is a challenge.
  • Data export to other what natural scientists call distributed databases.  Museums routinely ship data in certain formats to data aggregators to help provide data sets that cover larger regions or for the benefit of consortia.  Data formats and methods of accessing data are constantly in flux and vary from domain to domain and provider to provider.
  • Ability to import and host other data sets resulting from research or provided by other scientists.  E.g., the UC and Jepson Herbaria combines their collections data with other herbaria in California to create a searchable state-wide collection set.  This information is not hosted in SMaSCH (used the the UC and Jepson Herbaria) but in a separate set of tables that can be queried via a separate interface.
  • Essig currently hosts some data from other institutions in the Essig specimen system. This is a bit different from the UCJeps hosting of non-UCJeps data, because the Essig data from other institutions doesn't necessarily exist in digital form anywhere else. It's not stored in separate tables from the rest of the Essig data.
  • Data also will have to get exported to TAPIR providers (or whatever will come after TAPIR), so there has to be a way to export DarwinCore fields for this. Some data (UCMP) still has to be exported to a Digir provider. There are a lot of details here -- non-public data and fields can't be exported, so this all has to be able to configurable for each institution.
  • Related to this is the need to allow for metadata synchronization among systems that contain metadata for specific collected objects (e.g., between the collection management system, the digital asset management system, the data broker, GenBank, and so on).  This is a big challenge and is the subject of various major grant proposals.

Field notebook integration

Status: In development

While not currently integrated into SMaSCH, the Herbaria, like other natural history museums, wants to integrate field notebook information with specimen information.  A use case should document this.

  • Integration with GReF
  • Management of field notebooks within the system (e.g., Arctos?)


Status: In development

See the Arctos definition of Projects.  Specify also has a Projects entity.


Status: In development

Arctos and Specify also contain information about permits, an increasingly important part of the Biocode project too.

Other file association and management

Sound files for bird calls and other animal vocalizations, video, and so on.

  • Managed in the collection management system
  • Images in something like Teragrid (e.g., Arctos)
  • Images in CDL

Geospatial services and georeferencing


  • No labels