Using UCJEPS-1.8 version of CollectionSpace as a base, we are creating a prototype with data for Essig using data from the Essig database that is from the CalBug project.
More notes about the data migration work are at Essig-CalBug data analysis
1.8 import service has the ampersand bug, so those have to be replaced in field collector name and field collector location
Data are basically darwin core so we should capture the mappings and process.
Extracted collector information, mapped to person authority, and loaded 1922 records.
Did not extract taxon authors. Temporarily those are on the taxon authority screen under citation (or source?)
Extracted taxonomic names, mapped to taxon authority, and loaded 2831 records. Loaded class order family into a note field.
Extracted specimen data, mapped to UCJEPS-customized collectionobject, and loaded 53119 records in batches of 3000. The first batch (batch0) did not fully load because of problems with "<" characters. Update: Reloaded batch0, and now record count is 55383 (plus 2 records created by hand). I think some of batch0 still did not get created due to this back and forth but will look more closely later.
Collection Date: Eventually, this will be a structured date field allowing begin date and end date to be entered (which is common for insect collections -- set a trap and come back some days later). For now, I took the two sets of date fields and concatenated them and placed them in the Brief Description field. I did not properly handle the fact that for many records in the data set, the start date and end date are identical (assuming that the end date would be blank), so that field often looks like: "Collected from (DMY): 1 5 1930 to (DMY) 1 5 1930". That is something to fix later, or just wait for Collection Date to be handled by the structured date widget.
Higher Taxonomy on specimens (class order family). For Entomology, it is important to see the higher taxonomy even on the specimen record. For now, I concatenated class, order, and family, and stuck them in the Notes field on the Determination History list.
Determination qualifications: Significant data work went into determining whether the identification needed a qualifier like "sp.", "n. subsp." or other values. Often, that text was in the data file in the species or subsp column by itself, but sometimes it was in the species column at the end of a name (e.g., "species_name (n. subsp. nr.)") I handled some of those cases, but not all. We need to check my assumptions about how this works in the mapping to Determination History.
Three routines were created in Talend to parse data out for CSpace fields. These use similar logic for finding cases for parsing.
The following Talend jobs were created to parse the CalBug cache data
We are not doing full source control in this prototype but will take good notes to make it easier to port these minimal changes later.
our-tenant-tenant.xml (backed up to ~choffman)
Determination Qualifier dropdown in Determination History (<selector>taxonomic-identification-qualifier</selector>) : Changed to values for Essig (e.g., sp.)
Collections dropdown: for now using this field for Essig DB institutions. <selector>object-identification-collection</selector>
Country dropdown: Recoded "United States" (from "USA"). <selector>collection-object-fieldLocCountry</selector>
Form dropdown: Changing to Essig values (initially Pinned and Liquid). <selector>object-description-form</selector>
State dropdown: Recoded "California" (from "CA"). <selector>collection-object-fieldLocState</selector>
County dropdown: Replaced all UCJEPS values with 79 unique counties from Essig. <selector>collection-object-fieldLocCounty</selector>
Rank dropdown: Updated to match data values in Essig data. <field id="taxonRank" seperate_ui_container="true">
I am attempting to add TaxonRank to the list of summary fields that appear in search results. I did add the "mini" attributes to the taxonRank field id element, but I need to do some work in the UI layer for that to take effect.
Changed some text. Removed information about reader account since readonly template for Cataloging not in place.
Changed field label for csc-object-identification-number-objects (UCJEPS "Number of Sheets") to "Number of Specimens". In 1.8, this header is hard-coded.
Commented out many sections that are probably not needed by Essig.