Using the UCJEPS-2.0 version of CollectionSpace as a base, we are creating a prototype with data for Essig using data from the Essig database that is from the CalBug project.
Notes about this prototype
- This is version 2.0 of CollectionSpace which was released in December 2011. Version 2.1 is being released February 2012. This version includes features not available in the earlier prototype, such as advanced search and reporting.
- The customizations added for UCJEPS-2.0 were extensive and are all useful to Essig as well.
- This is running on a lower end virtual machine with limited RAM. Search and display are slow.
- The data imported are just over 55K specimen records provided by Joyce. They are CalBug data records that are already in the EssigDB but only include the subset of fields currently identified for the CalBug cache. For example, collector names are concatenated together into one comma-separated list.
- Some data mappings are definitely temporary. No fields were added, so some data went into notes fields.
- Two sample reports were developed and are available on the main specimen screen (collection objects) in the upper right in the Run Report dropdown.
- To use Advanced Search, make sure to change the Record Type to Cataloging. The taxonomy search field is the third down Field-based Search section. The field title (Name) is partially covered by a wayward plus sign (which allows you to perform an OR search on two or more scientific names).
- Media uploading does work, but sometimes after selecting the file to upload, the file name does not display in the Upload Media text box. If you click the Upload button, your image should be correctly uploaded.
- There is a bug in the hierarchical relationships section of the taxonomy and other authorities, so relationships are not created correctly. This is supposed to be fixed in the next release.
Some next steps
- Add some fields to the Taxonomy authority to replace functionality in Species DB (e.g., "Species In Collection").
- Add taxon rank to display on the results of searching the Taxonomy authority.
- Add some fields to Cataloging for field collection and higher taxonomy.
- HorizontalDatum (Oops, field for Geodetic Datum does exist in 1.8)
- Class, Order, Family
- For full display name on specimen, concatenate together the taxon name and qualifier (e.g., "n. subsp."), for import and during data entry.
- In Taxonomy authority, create relationships to parent taxa.
- Taxon authors or citations? How to model and parse data.
- Collection dates: We are now using the structured date widget (which allows begin date and end date to be captured as well as fuzzy dates such as "1906". However, the imported data are fairly ugly and need to be updated from their current format. Change display name to preferred form (15.VI.2006).
- When Place authority is available, map data
- Create public search and download interface.
- Collecting event authority and labels
More notes about the data migration work are at Essig-CalBug data analysis
Data are basically darwin core so we should capture the mappings and process.
Extracted collector information, mapped to person authority, and loaded 1922 records.
Did not extract taxon authors. Temporarily those are on the taxon authority screen under citation (or source?)
Extracted taxonomic names, mapped to taxon authority, and loaded 2831 records. Loaded class order family into a note field.
Extracted specimen data, mapped to UCJEPS-customized collectionobject, and loaded 53119 records in batches of 3000.
Collection Date: Eventually, this will be a structured date field allowing begin date and end date to be entered (which is common for insect collections -- set a trap and come back some days later). For now, I took the two sets of date fields and concatenated them and placed them in the Brief Description field. I did not properly handle the fact that for many records in the data set, the start date and end date are identical (assuming that the end date would be blank), so that field often looks like: "Collected from (DMY): 1 5 1930 to (DMY) 1 5 1930". That is something to fix later, or just wait for Collection Date to be handled by the structured date widget.
Higher Taxonomy on specimens (class order family). For Entomology, it is important to see the higher taxonomy even on the specimen record. For now, I concatenated class, order, and family, and stuck them in the Notes field on the Determination History list.
Determination qualifications: Significant data work went into determining whether the identification needed a qualifier like "sp.", "n. subsp." or other values. Often, that text was in the data file in the species or subsp column by itself, but sometimes it was in the species column at the end of a name (e.g., "species_name (n. subsp. nr.)") I handled some of those cases, but not all. We need to check my assumptions about how this works in the mapping to Determination History.
Three routines were created in Talend to parse data out for CSpace fields. These use similar logic for finding cases for parsing.
- MakeTaxonDisplayName. Right now gets the core taxonomic name without the qualifier (e.g., "n. subsp.") and is used to create the taxonomy authority. This approach should be modified to get a display name that includes the qualifier for display purposes.
- MakeSpNote: Parses names to get the qualifier (e.g., "n. subsp.", "sp.")
- GetRank: Determines the rank of the scientific name for the taxonomy authority
The following Talend jobs were created to parse the CalBug cache data
- parsesciename: Uses the routines to add display name, qualifier, and rank to cache data
- getscinames_2: Created for v2.0, creates new forms of refname
- maketaxonxml_v2: Removes ampersand bug workaround
- getcollectors_v2: Creates new form of refname
- getcounties: Takes original cache file and parses out counties to populate county dropdown
- getstateprovince: Takes original cache file and parses out states to populate state dropdown (always CA here)
- getcountries: Takes original cache file and parses out countries to populate country dropdown (always United States here)
- getinstitutions: Takes original cache file and parses out institutions to populate collections dropdown (e.g., EMEC, SDNHM)
- getranks: Takes original cache file and parses out ranks to populate rank dropdown for taxonomy authority
- makeobjectxml: Main input is output from parsesciname (cache plus display name, rank, qualifier); maps in refnames for collectors and taxonomy; creates XML records for import into collectionobject
App layer modifications
Code for this version is in IST subversion.
Determination Qualifier dropdown in Determination History (<selector>taxonomic-identification-qualifier</selector>) : Changed to values for Essig (e.g., sp.)
Collections dropdown: for now using this field for Essig DB institutions. <selector>object-identification-collection</selector>
Country dropdown: Recoded "United States" (from "USA"). <selector>collection-object-fieldLocCountry</selector>
Form dropdown: Changing to Essig values (initially Pinned and Liquid). <selector>object-description-form</selector>
State dropdown: Recoded "California" (from "CA"). <selector>collection-object-fieldLocState</selector>
County dropdown: Replaced all UCJEPS values with 79 unique counties from Essig. <selector>collection-object-fieldLocCounty</selector>
Rank dropdown: Updated to match data values in Essig data. <field id="taxonRank" seperate_ui_container="true">
I am attempting to add TaxonRank to the list of summary fields that appear in search results.
UI layer modifications
Changed some text. Removed information about reader account since readonly template for Cataloging not in place.
Changed field label for csc-object-identification-number-objects (UCJEPS "Number of Sheets") to "Number of Specimens". In 1.8, this header is hard-coded.
Commented out many sections that are probably not needed by Essig.