The Bancroft Library is one of the largest and most heavily used research libraries of rare and unique materials in the West. It actively collects the records of poets, writers, scientists, documentary photographers, businesses, organizations, the university, and more. A growing number of materials in these collections are now born digital, requiring highly specialized and rapidly evolving tools and processes in order to secure, describe, preserve, and provide researcher access while maintaining data integrity, security, and authenticity.
Assistant Director Mary Elings and Digital Archivist Kate Tasker will talk about establishing the Bancroft’s Born-Digital Collections program to properly steward these research materials. They will discuss the major components of the program, its collaborative nature, and the unique technologies involved. They will touch on digital forensics, data recovery, data screening and analysis, working with obsolete media, secure access, and more. They will discuss the current status of this developing program and some of the major challenges ahead.
When: Thursday, 14 December from 12 - 1pm
Where: 200C Warren Hall, 2195 Hearst St (see building access instructions on parent page).
What: Born Digital Research Archives: Technologies and Challenges
Presenting: Mary Elings (Asst Director); Kate Tasker (Digital Archivist)
Prior to the meeting please review:
Optionally, other posts in the ERS Born-Digital Access Blog Series might also be interesting to the group.
Presenting: Mary Elings, Kate Tasker (Bancroft Library)
Aaron Culich, Research IT
Amy Neeser, Library / Research IT (RDM)
Andy Lyons, UC Div of Ag Resources
Barbara Gilson, SAIT (emeritus)
Camille Crittenden, CITRIS
Chris Hoffman, Research IT
Jason Christopher, Research IT
Jenn Stringer, RTL
John Lowe, Research IT
Kortney Rupp, Library (Chemistry)
Meaggan Leavitt, ETS
Nico Tripcevich, ARF
Patrick Schmitz, Research IT
Perry Willets, CDL
Rick Jaffe, Research IT
Ron Sprouse, Linguistics
Steve Masover, Research IT
Many formats, many media come into the Bancroft (floppy discs, flash drives, 20 year old computers, CDs, DVDs, memory cards, Zip disks, etc.; photos, text files, word processing files, databases). Observation that this is actually not so different from the past: vellum, onion-skin paper, parchment, paper, etc. However these formats are more fleeting and fragile than older media.
.2 TB processed (examined, arranged, catalogued/described, de-duplicated) vs. 18.9 TB total holdings. Compare to Hubble Telescope: generates 10TB data to archive per year. Average collections 140GB, though this is rough and there are outliers from 1MB to 1TB. 5hrs/gig is a very rough and preliminary metric for how long it takes to process a collection; again, varies greatly.
Forensic Recovery of Evidence Device (FRED): to read many media. Techniques preserve file names and folder hierarchies, date stampes, etc. Same kinds of techniques used by FBI in preserving digital evidence. 1.5 FTE, some part time, some student help -- but this staff is spread over additional projects, not just processing these collections.
Tools used in the course of processing: Internet Archives tool, ePADD (email processing), Forensic Toolkit (FTK), BitCurator (UNC Chapel Hill is developing this open-source, funded by grant institutions), DROID
Migrating to more preservable file formats: a manual process. Adobe Bridge, Photoshop, etc.
Aspiration: provide a laptop - DeepFreeze to assure that it is restored to a pristine state after researcher use - a secure environment in UCB Library reading room to view digital collections - blocked USB ports. Have one laptop, serially load digital collections as researchers need. Another possibility (cf. UC Irvine blog post in this meeting's pre-reading list: Born-Digital and in the Virtual Reading Room)
Ingest by Library includes recovery of deleted files. This is not necessarily what a donor wants, and certainly presents a secure storage problem. Telling donors about the process is sometimes a spur to the donor pre-organizing / selecting what they actually want to donate rather than throw it all over the Library's transom.
Many factors go into decision of what to process first: fragility of media, user demand, curatorial input, etc.
Cloud-based content (e.g., Google Drive, Flickr): still very much a work in progress to recover these. An archivist at University of York is working on this: but the act of downloading is already altering it -- where does the revision history go? How to re-create an environment a writer was working in -- what if the environment is a cloud environment. Work on this is going on in corporate contexts as well, but this is (deliberately) out of public view, invisible to university librarians.
Would like to work with people at the beginning of their careers to plan & organize for eventual deposit of their materials in the Bancroft. UCB does have retention and disposition guidelines; guidelines are being developed to further this kind of planning.
Tools (NLP) that can redact the content of collections but give researchers a meta-view of what's in a collection: helping them to decide whether a deeper look (and possibly a visit from another institution/location) is likely to be worthwhile.