This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.
The line between the humanities and the human sciences has always been fuzzy, with researchers across a wide spectrum of disciplines often documenting the same individuals and materials but with different outcomes in mind. The promise of the IT revolution is that sharing of data is not only easily shared but that it becomes possible to get credit for such "publishing." The problem for researchers, both those with materials already in hand and those in search of materials, is that there is no common infrastructure for sharing and searching. DATAstor will be that infrastructure.
It is still early in the digitization efforts for researchers and archives who either currently maintain a collection or looking to develop one. Just as importantly, few of them have the kind of budget to develop infrastructure on their own. By providing an infrastructure that not only allows individuals and organizations to catalogue their materials but to make those materials accessible, DATAstor will become a common interface that will speed up the work of processing and analyzing documents and artifacts:
The Mellon Foundation's various initiatives have laid a solid foundation for the future of scholarly publishing in the humanities with JSTOR (as well as its contemporary cousin, Project Muse), and for the future of the scholarly research process with Zotero. The latter is particularly interesting because we now have a solid tool for individual research that will, I believe, shortly also offer the possibility for making research "social," creating new forms of collaboration and innovation heretofore the product of combing through footnotes or chasing someone down at a conference. (More on this in a moment.)
Humanities scholarship is the study of complex artifacts in the service of understanding human nature. What humanists need are these artifacts as well as the variety of information "clouds" (culture, history, biography) that surround them in order better to understand how the artifacts refract/reveal human nature. The kinds of artifacts humanities scholars work with vary by discipline. In some, the data is widely available as already published texts; in others, the data for their research is not as readily available but still secured in various kinds of archives — museums, yes, but also local courthouses.
ARTstor provides a solid foundation for scholars researching materials found in conventional arts collections. But what about those humanists who create their own data? There are scores of verbal and material items that will never grace the pages of most books and will never be catalogued in any collection. These are the focus of documentary efforts by scholars in the humanities disciplines of folklore studies and oral history, fields which blend over into the human sciences of linguistics, anthropology, psychology, and cognitive science.
In the future, they will have DATAstor.
The contents of DATAstor will be materials collected by humanities field researchers, which can include, but is not limited to, the following:
These materials will be available as entries within a database properly congregated by essential fields, such as those in the Dublin Core. Non-verbal artifacts will be described in text and will, I hope, be accompanied by a suitable recording: an audio recording of a story, an image of a house, the image of a hand-written letter drawn from a personal collection.
Each entry will, I think, best be limited to a single artifact, however that may be described, e.g., one story from a longer conversation — and so there will need to be some way to indicate not only an item belonging to a larger collective but also if there is a sequence within that collective.
Researchers will be able to search DATAstor through any parameter or across all parameters, of course, but one of the key things DATAstor will have to ensure is that there is a way to discern not only the original creator/performer of an item but the researcher who recorded or entered the item into the database.
ARTstor does not itself house materials, in the sense of creating or owning collections, but simply acts as middleware between collections and subscribers. The quality of content available through ARTstor is dependent upon the extant quality review processes already in place, as is the case for materials found through JSTOR.
DATAstor will, in order to assure the quality of its contents (and not simply be another Flickr), have to depend upon a similar set of relationships. There are a few databases already out there that have a peer-review process in place, e.g. the Child Language Data Exchange System (http://childes.psy.cmu.edu/), but not a lot. One of the goals of DATAstor will be to develop an easy-to-deploy, open source database system that institutions and organizations can host. By developing the software for others to use and populate, DATAstor will be able to establish standards and conventions that will make its job as middleware a lot simpler and will, in the process, encourage the assessment of data creation and publication as a scholarly paradigm.
A scholar of American social history has just returned from an interview with an individual involved in designing the Higgins boat, the signature landing craft of WW2. She has a recording of the interview, captured on an Edirol R-09 and thus sitting on an SD card as a WAV file, and she has the scan of sketch the individual had kept of an early version of the craft, captured on her laptop as a high-resolution TIFF file using a portable Canon LED scanner.
As a member of the Oral History Association, she has access to the on-line archive the Association maintains of oral history materials. She fires up her web-browser and accesses the archive using the friendly, Zotero-like interface. She clicks on the collection that bears her name and types in her password to authenticate herself as a content creator. She types in the various metadata suggested by the extant fields in the database, including the individual interviewed, the date and place (given both as the human place of Gretna, Louisiana but also as the geo-coordinates) of the interview. As content creator, she knows the application will automatically fill in those fields that associate her own part in the process with the data she is publishing.)
The first record she creates is for the interview, which she outlines from memory, with a notation that the text is just that, and she then uploads the WAV file, knowing that the DATAstor application will, when it makes the file available to others, automatically transform the file into a compressed version with proper watermarking. Since this is research early in the process of a book she is working on, she marks the data to remain private for the next year, whereupon others will have access to the recording. She, however, is more than happy to have the metadata made public, because someone else working in an adjacent field may come across this entry and contact her with a question which could lead to an interesting dialogue or with some information she could use right away.
The second record is for the sketch, and so she tells the DATAstor UI to duplicate the previous entry but still to create a new one with the TIFF file attached.
Hundreds of miles from our scholar, and with a different set of concerns about data, the archivist for the Urban Appalachian Center in Cincinnati, Ohio has opened up a box of papers left to the Center by an eminent sociologist who had been about to throw away a lifetime's worth of research until someone told her that the UAC would be able to make the research available to others. The box contains letters, field notes, and photographs — some of which have careful annotations about who is in the photograph and others that do not.
In cooperation with a local university, the UAC has a dedicated server with the DATAstor application on it and a reasonable amount of room for attached files. Our archivist considers possible uses of the materials and the limitations of their current infrastructure, he decides to scan and upload the images, but that he will OCR what he can of the texts — some of the field notes are typed (yay!) — and then do the rest himself, reading hand-written letters and notes and typing them in, as he has time.
Later in the week, the photos are at least up, with some annotations still to type in, and our archivist has poked at a few letters and other documents, but there really isn't that much time. He has just finished figuring out how many months it's going to take if he dedicates an hour a day to the task when in walks a young linguist, who is interested in the written works of Appalachians. Through the scholarly grapevine, she has heard about the eminent sociologist's materials being here and wonders if she can look through the collection in hopes of discovering a few letters from some Appalachians that the scholar worked with. They would be a real boon to her research into the differences between oral and written discourse among ethnic minorities.
The archivist is glad to take the box from her desk. As our young scholar reaches down to pick it up, she notices the DATAstor UI on his screen. "I'd be happy to input any materials not yet catalogued," she says. "I'll be typing up whatever I use anyway, since I'm using a particular XML format my advisor has recently shown me."
"I'll set you up with a limited user account on our system," he replies. As she sits down at a nearby table, he goes to an administrator page of the DATAstor application and enters information for her that gives her limited abilities to enter content, and, just as importantly, alerts him to the new content for him to review before making it public.
Later, when she leaves with some photocopies of documents, he gives her her login information for the UAC's DATAstor application and reminds her that the account will automatically expire in 30 days.