Skip to end of metadata
Go to start of metadata

Using UCJEPS-1.8 version of CollectionSpace as a base, we are creating a prototype with data for Essig using data from the Essig database that is from the CalBug project.

Notes about this prototype

  • This is version 1.8 of CollectionSpace, so it is three to four versions old.  Therefore it lacks certain features (e.g., sorting of results, structured dates and fuzzy dates) and some performance improvements.
  • This is running on a lower end virtual machine with limited RAM.  Search and display are slow.
  • Version 1.13 and 2.0 of CSpace are focusing on fine-tuning some features, fixing bugs, and improving performance.
  • The data imported are just over 55K specimen records provided by Joyce.  They are CalBug data records that are already in the EssigDB but only include the subset of fields currently identified for the CalBug cache.  For example, collector names are concatenated together into one comma-separated list.
  • Some data mappings are definitely temporary.  No fields were added, so some data went into notes fields.
  • Joyce, Pete and I took a quick tour of this on Monday and had a good discussion about some of the issues and next steps.

Some next steps

  • Add some fields to the Taxonomy authority to replace functionality in Species DB (e.g., "Species In Collection").
  • Add taxon rank to display on the results of searching the Taxonomy authority.
  • Add some fields to Cataloging for field collection and higher taxonomy.
    • minElevationMeters
    • maxElevationMeters
    • MaxErrorInMeters
    • HorizontalDatum (Oops, field for Geodetic Datum does exist in 1.8)
    • Class, Order, Family
  • For full display name on specimen, concatenate together the taxon name and qualifier (e.g., "n. subsp."), for import and during data entry.
  • In Taxonomy authority, create relationships to parent taxa.
  • Taxon authors or citations?  How to model and parse data.
  • Collection dates: Parse into structured date field when available (starting 1.13?)
  • When Place authority is available, map data?
  • Create public search and download interface.

Data

More notes about the data migration work are at Essig-CalBug data analysis

1.8 import service has the ampersand bug, so those have to be replaced in field collector name and field collector location

Data are basically darwin core so we should capture the mappings and process.

Extracted collector information, mapped to person authority, and loaded 1922 records.

Did not extract taxon authors.  Temporarily those are on the taxon authority screen under citation (or source?)

Extracted taxonomic names, mapped to taxon authority, and loaded 2831 records.  Loaded class order family into a note field.

Extracted specimen data, mapped to UCJEPS-customized collectionobject, and loaded 53119 records in batches of 3000. The first batch (batch0) did not fully load because of problems with "<" characters.  Update: Reloaded batch0, and now record count is 55383 (plus 2 records created by hand).  I think some of batch0 still did not get created due to this back and forth but will look more closely later.

Collection Date: Eventually, this will be a structured date field allowing begin date and end date to be entered (which is common for insect collections -- set a trap and come back some days later).  For now, I took the two sets of date fields and concatenated them and placed them in the Brief Description field.  I did not properly handle the fact that for many records in the data set, the start date and end date are identical (assuming that the end date would be blank), so that field often looks like: "Collected from (DMY): 1 5 1930 to (DMY) 1 5 1930".  That is something to fix later, or just wait for Collection Date to be handled by the structured date widget.

Higher Taxonomy on specimens (class order family).  For Entomology, it is important to see the higher taxonomy even on the specimen record. For now, I concatenated class, order, and family, and stuck them in the Notes field on the Determination History list.

Determination qualifications: Significant data work went into determining whether the identification needed a qualifier like "sp.", "n. subsp." or other values.  Often, that text was in the data file in the species or subsp column by itself, but sometimes it was in the species column at the end of a name (e.g., "species_name (n. subsp. nr.)")  I handled some of those cases, but not all.  We need to check my assumptions about how this works in the mapping to Determination History.

Three routines were created in Talend to parse data out for CSpace fields.  These use similar logic for finding cases for parsing.

  • MakeTaxonDisplayName.  Right now gets the core taxonomic name without the qualifier (e.g., "n. subsp.") and is used to create the taxonomy authority.  This approach should be modified to get a display name that includes the qualifier for display purposes.
  • MakeSpNote: Parses names to get the qualifier (e.g., "n. subsp.", "sp.")
  • GetRank: Determines the rank of the scientific name for the taxonomy authority

The following Talend jobs were created to parse the CalBug cache data

  • parsesciename: Uses the routines to add display name, qualifier, and rank to cache data
  • getscinames: Takes output from parsesciname and gets unique scientific names for taxonomy authority, creates CSIDs and refnames
  • getscinames_2: Created for v2.0, creates new forms of refname
  • maketaxonxml: Takes output from getscinames and creates import XML records for loading into taxonomy authority
  • maketaxonxml_v2: Removes ampersand bug workaround
  • getcollectors: Takes original cache file and gets unique collector names for person authority, creates CSIDs and refnames
  • getcollectors_v2: Creates new form of refname
  • makecollectorxml: Takes output from getcollectors and creates import XML records for loading into person authority
  • makecollectorxml_v2:
  • getcounties: Takes original cache file and parses out counties to populate county dropdown
  • getstateprovince: Takes original cache file and parses out states to populate state dropdown (always CA here)
  • getcountries: Takes original cache file and parses out countries to populate country dropdown (always United States here)
  • getinstitutions: Takes original cache file and parses out institutions to populate collections dropdown (e.g., EMEC, SDNHM)
  • getranks: Takes original cache file and parses out ranks to populate rank dropdown for taxonomy authority
  • makeobjectxml: Main input is output from parsesciname (cache plus display name, rank, qualifier); maps in refnames for collectors and taxonomy; creates XML records for import into collectionobject

App layer modifications

We are not doing full source control in this prototype but will take good notes to make it easier to port these minimal changes later.

our-tenant-tenant.xml (backed up to ~choffman)

Cataloging

Determination Qualifier dropdown in Determination History (<selector>taxonomic-identification-qualifier</selector>) : Changed to values for Essig (e.g., sp.)

Collections dropdown: for now using this field for Essig DB institutions. <selector>object-identification-collection</selector>

Country dropdown: Recoded "United States" (from "USA"). <selector>collection-object-fieldLocCountry</selector>

Form dropdown: Changing to Essig values (initially Pinned and Liquid). <selector>object-description-form</selector>

State dropdown: Recoded "California" (from "CA"). <selector>collection-object-fieldLocState</selector>

County dropdown: Replaced all UCJEPS values with 79 unique counties from Essig. <selector>collection-object-fieldLocCounty</selector>

Taxonomy authority

Rank dropdown: Updated to match data values in Essig data. <field id="taxonRank" seperate_ui_container="true">

I am attempting to add TaxonRank to the list of summary fields that appear in search results.  I did add the "mini" attributes to the taxonRank field id element, but I need to do some work in the UI layer for that to take effect.

UI layer modifications

index.html

Changed some text. Removed information about reader account since readonly template for Cataloging not in place.

CatalogingTemplate.html

Changed field label for csc-object-identification-number-objects (UCJEPS "Number of Sheets") to "Number of Specimens".  In 1.8, this header is hard-coded.

Commented out many sections that are probably not needed by Essig.

  • No labels