Scheduled DB Maintenance: January 21st - 8:00 AM to 10:00 AM. Confluence will be unavailable during this time.
Skip to end of metadata
Go to start of metadata


Joyce Gross, John Deck, Patrick Schmitz, Michael Black, Richard Millet, Susan Stone, Aron Roberts, Chris Hoffman


Questions from Joyce and John (30 minutes), Questions from Patrick and Richard (30 minutes), Next steps (10 minutes)

Action items

  • Send existing CollectionSpace schema files (Richard and Aron?)
  • Send link for CollectionSpace API model (Richard and Aron?)
  • As they become available, send around things like a) REST calls to view PAHMA data loaded into a CollectionSpace instance and b) links to view PAHMA data in CollectionSpace user interface and c) anything else that can be looked at. (Richard, Susan and Aron?)
  • Identify next steps to provide summaries back to Essig leadership. (Chris to follow up with Joyce and Patrick)

Summary notes

Museums want to see a fully functional system in order to make a decision. However, what we are trying to decide right now is a strategy and direction for committing our resources.  We are not requiring moves to a new system right now. 

How long will it take to load data from an existing system into CollectionSpace?  Depending on the quality of the data, it might take a week of work and should not be different from migrating data to a system like Specify or Arctos.  Of course, a lot of work might be needed to clean up data.  (We will be working on better estimates and approaches to data migration.)

How long will it take to customize CollectionSpace for natural history museums?  It is hard to say right now.  The Cambridge team will be responsible for much of the customization model.  The CollectionSpace team expects some work on this aspect of the platform in December.

Patrick reviewed the documentation about the existing Essig specimen database and discussed issues with the CollectionSpace project leads.  Label printing looked like a potential issue although other functions and capabilities are in scope for CollectionSpace.  Naturally as one begins to move to a new system, some challenges will be found.  For example, the flexibility of data import and the noisiness of data could be issues (though these will be less problematic for later deployments than for earlier deployments).  With regard to label printing, the Essig system has a good example that can help inform CollectionSpace design.  One of the other CollectionSpace implementations will be at the Walker Art Museum; they have some complex label printing requirements.

How long will it take to build out all the functionality that one finds in Arctos or Specify?  Patrick: Maybe 2 years from today depending on resources.  (However, to replace existing functionality in some of the BNHM systems, the time should be less.  As we learn more from the PAHMA deployment and early experiments with Herbaria data, we will pass those on.)

CollectionSpace is also in active discussions with other open source projects that provide functionality also needed by BNHM museums, e.g., around archives and libraries.  CollectionSpace will not include the full capabilities of a system like Archon, but likely there will be some kind of strong integration between CollectionSpace and an archival management system.

When will we be able to see real data in CollectionSpace?  When release 0.3 comes out and when PAHMA data are loaded into a 0.3 instance -- this should be in about 3 weeks.  More details on timing to come.

Latitude/longitude controls: The existing Essig system features some useful capabilities here.  When lot/long data are entered, first the system checks to see if data are entered in one of several formats.  If the format is acceptable, an automatic conversion to a decimal value is done and stored in another field.  Otherwise the user receives an error message.  The system stores both verbatim and decimal versions of the lat/long as well as the datum for the determination (usually WGS84 now).  CollectionSpace will have some similar capabilities.  Lat/long will not be stored as simple text.  The Cambridge team is developing a model for validation using javascript and other capabilities.  We don't know the schedule for that work yet, but CollectionSpace will have robust data validation and conversion capabilities.  We can use the model of the existing Essig system to help determine what CollectionSpace needs to have.  There is some active design and development work right now related to dimensions and measurement.

System support and monitoring: Where is the PAHMA MySQL data?  During this development and prototype deployment phase, the PAHMA MySQL data and PAHMA CollectionSpace software instance are housed on a virtual server at Slicehost (a cloud-based provider of virtual machines).  How does notification and support happen?  Those capabilities are still being built of course.  However, it is likely that there will be some email notification and eventually robust monitoring tools.  (In the meantime, we can do what we do now -- write cron jobs that notify a support person if server load is too high or if available disk space is too low or if a system is not responding to queries.  The CollectionSpace team is working on estimates for support staff requirements and recommendations.)

Data import: It is expected right now that by release 1.0 of CollectionSpace there will be tools to help with data import and data load from Excel or CSV formats.  Data loading is an important task, and CollectionSpace will have some good capabilities here.  Probably there will be a tool that has you map columns between your data file and CollectionSpace's data model.  How will validation happen on data load?  This is still being determined but validation tools for bulk data loading is planned for CollectionSpace.  We talked about the Biocode model (entire spreadsheet is rejected and messages are output to the user). CollectionSpace also looking at how to do validation in the ETL (extract-transform-load) process.

Status of CollectionSpace development: Chris will look into a way to summarize development status for Berkeley team.

Taxonomy: John and Joyce asked how taxonomy information will be modeled and stored.  Patrick is working on controlled vocabularies right now in fact, though in this next release only simple flat vocabularies will be included.  CollectionSpace will have a very powerful engine for controlled vocabularies, and part of the design was based on a review of the Herbaria's requirements for taxonomies and taxonomic identification.  CollectionSpace will feature multiple graphs, relationships, flexibility, type-ahead data entry and search.  How will different kinds of taxonomic trees be represented (e.g., primatologists use superfamily a lot but it is not used by entomologists)?  Still to be determined.  CollectionSpace will allow for all the customizations you will need, and it will be easy to copy the schema and model for controlled vocabularies between instances.  We will need to work with the BNHM community and the informatics developers to determine how best to implement these capabilities. 

  • No labels