Navigation:
Documentation
Archive



Page Tree:

Child pages
  • Bamboo Book Model - Context and Design

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

The documentation on this page is excerpted from the page Bamboo Book Model - CMIS Binding and Fedora Repository Implementation.

Bamboo Book Model

The Bamboo Book Model represents a codex book as a hierarchy of typed containers and objects with a predictable structure. Each node in the hierarchy has predictable metadata attributes.

The head node, the BambooBook object, contains three nodes:

  1. a pages node – a bag of page nodes, for which order is inferred from a page number attribute on each page node; 
  2. a contents node – a bag of contents objects; and, 
  3. a sources node – a bag of source objects.

Of these three, the pages branch of the hierarchy is the most fully developed. Each page node is a container that comprises a set of objects: plain text; page image; HTML text; TEI; and Morphadorned text (http://morphadorner.northwestern.edu). No manifestation of a page is required. Any manifestation present in the page node declares its type and the MIME type of its content and includes its content 

We envisioned a Contents container that would include various views of the book. A table of contents would provide the traditional part, chapter, and section map onto the pages of the book. Other maps, e.g. list of illustrations, word index, etc. would also be possible. While these maps could be rendered for human readers, the function of the content maps is to make these traditional structures available to software. The content map need not be limited to traditional structures, of course. Each map would declare its type, and each map type would have a predictable structure. No content maps were modeled in Phase I.

The Book Model normalizes book content by transforming content from different content repositories with possibly disparate formats into a set of common formats. In some cases information is lost in the transformation. For example, information encoded in the typographical features of a page of text is lost in the plain text manifestation. In some cases information is added. A Morphadorned text, for example, is richly annotated. 

We envisioned placing source documents, images, and so on in the Sources container so that it would be possible to inspect the state of a text prior to the normalizing transformations we applied, and so that when commonly used tools can readily be applied to the source document, as with a TEI encoded text, for example, the source document would be available. Typing and provenance metadata would be applied to the objects within the Sources container. Although source content is provided  by the CI HUB, content type and provenance metadata were not modeled and consequently, those metadata for source content are not yet provided.

The Contents and Sources containers were not implemented in the CMIS binding of the Book Model. The Pages container was implemented in the CMIS binding, including the page manifestations noted above.

Implementation through Binding to CMIS

We implemented the Bamboo Book Model by binding it to the CMIS API. A software binding for a abstract model like the Bamboo Book Model is a mapping of the structural elements of the model onto the data structures and methods of a software. Often, these data structures and methods are expressed in an API. Content Management Interoperability Services (CMIS) is an API designed to represent common create, read, update, and delete operations on folder- and file or document-like objects. CMIS also provides for the definition of attributes of objects through the extension of its base object classes. Binding the Book Model to CMIS allows us to leverage software built for content management to operate on the structure and content of the codex book.

Binding details

Here is an example of extending the base CMIS class folder class to serve as the root container of the BambooObject. We extended the base class by adding four properties:

  1. dc.issued captures when the object was obtained from a source repository by the Collections Interoperability Hub
  2. bamboo.source and bamboo.source-url capture the repository from which the object was obtained and the identifier of the object in the repository
  3. bamboo.owner captures who requested that the object be obtained through the CI Hub

The bamboo:book object type is an extension of bamboo:folder, which has properties relevant to any bamboo container: all attribute types declared in Dublin Core. This code snippet illustrates how CMIS types are declared in the Apache Chemistry OpenCMIS tool we used for our local object store. (See below.)

 

CMIS Binding of the BambooBook
<?xml version="1.0" encoding="UTF-8"?>
<cmisra:type xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/"
	xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
	xsi:type="cmis:cmisTypeDocumentDefinitionType"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:dcterms="http://purl.org/dc/terms/"
	xmlns:bamboo="http://bamboo.at.northwestern.edu/types">
  <cmis:id>bamboo:book</cmis:id>
  <cmis:parentId>bamboo:folder</cmis:parentId>
  <cmis:displayName>Book</cmis:displayName>
  <cmis:queryName>BOOK</cmis:queryName>
  <cmis:description>Folder containing all content relating to a specific book or text</cmis:description>
  <cmis:baseId>cmis:folder</cmis:baseId>
  <cmis:creatable>true</cmis:creatable>
  <cmis:fileable>true</cmis:fileable>
  <cmis:queryable>true</cmis:queryable>
  <cmis:fulltextindexed>false</cmis:fulltextindexed>
  <cmis:includedInSupertypeQuery>true</cmis:includedInSupertypeQuery>
  <cmis:versionable>false</cmis:versionable>
	
  <cmis:propertyStringDefinition>
    <cmis:id>dc:issued</cmis:id>
    <cmis:localName>issued</cmis:localName>
    <cmis:displayName>issued</cmis:displayName>
    <cmis:queryName>issued</cmis:queryName>
    <cmis:description>Dublin Core Issued date</cmis:description>
    <cmis:propertyType>string</cmis:propertyType>
    <cmis:cardinality>single</cmis:cardinality>
    <cmis:updatability>readwrite</cmis:updatability>
    <cmis:inherited>false</cmis:inherited>
    <cmis:required>true</cmis:required>
    <cmis:queryable>true</cmis:queryable>
    <cmis:orderable>false</cmis:orderable>
  </cmis:propertyStringDefinition>
  
  <cmis:propertyStringDefinition>
    <cmis:id>bamboo:source</cmis:id>
    <cmis:localName>source</cmis:localName>
    <cmis:displayName>Source Repository</cmis:displayName>
    <cmis:queryName>source</cmis:queryName>
    <cmis:description>Bamboo Source Repository</cmis:description>
    <cmis:propertyType>string</cmis:propertyType>
    <cmis:cardinality>single</cmis:cardinality>
    <cmis:updatability>readwrite</cmis:updatability>
    <cmis:inherited>false</cmis:inherited>
    <cmis:required>true</cmis:required>
    <cmis:queryable>true</cmis:queryable>
    <cmis:orderable>false</cmis:orderable>
  </cmis:propertyStringDefinition>
	
  <cmis:propertyStringDefinition>
    <cmis:id>bamboo:source-url</cmis:id>
    <cmis:localName>source-url</cmis:localName>
    <cmis:displayName>Source URL</cmis:displayName>
    <cmis:queryName>source-url</cmis:queryName>
    <cmis:description>Bamboo Source URL</cmis:description>
    <cmis:propertyType>string</cmis:propertyType>
    <cmis:cardinality>single</cmis:cardinality>
    <cmis:updatability>readwrite</cmis:updatability>
    <cmis:inherited>false</cmis:inherited>
    <cmis:required>true</cmis:required>
    <cmis:queryable>true</cmis:queryable>
    <cmis:orderable>false</cmis:orderable>
  </cmis:propertyStringDefinition>
  <cmis:propertyStringDefinition>
	<cmis:id>bamboo:owner</cmis:id>
	<cmis:localName>owner</cmis:localName>
	<cmis:displayName>Owner</cmis:displayName>
	<cmis:queryName>owner</cmis:queryName>
	<cmis:description>Owner</cmis:description>
	<cmis:propertyType>string</cmis:propertyType>
	<cmis:cardinality>single</cmis:cardinality>
	<cmis:updatability>readonly</cmis:updatability>
	<cmis:inherited>true</cmis:inherited>
	<cmis:required>true</cmis:required>
	<cmis:queryable>true</cmis:queryable>
	<cmis:orderable>true</cmis:orderable>
  </cmis:propertyStringDefinition>	
	
</cmisra:type>

 

A number of object types are declared:

Object typeExtendsPropertiesContains
bamboo:bookbamboo:folder
  • dc.issued
  • bamboo:source
  • bamboo:source-url
  • bamboo:owner
0..n bamboo:pages
bamboo:pagebamboo:folder
  • dc:issued
  • bamboo:seq
  • bamboo:page_name
  • bamboo:label
0..n bamboo:page-documents
bamboo:foldercmis:folder
  • dc:identifier
  • dc:date
  • dc:contributor
  • dc:coverage
  • dc:creator
  • dc:description
  • dc:format
  • dc:language
  • dc:publisher
  • dc:rights
  • dc:source
  • dc:subject
  • dc:title
  • dc:relation
  • dc:type
 
bamboo:page-documentbamboo:document
  • dc:issued
  • bamboo:seq
  • bamboo:page_name
  • bamboo:label
 
bamboo:page-imagebamboo:page-document  
bamboo:page-image-jp2bamboo:page-document  
bamboo: page-plaintextbamboo:page-document  
bamboo:page-xhtmlbamboo:page-document  
bamboo:page-morphadornedbamboo:page-document  
bamboo:page-teibamboo:page-document  
bamboo:page-thumb150bamboo:page-document  
bamboo:documentcmis:document
  • dc:identifier
  • dc:date
  • dc:contributor
  • dc:coverage
  • dc:creator
  • dc:description
  • dc:format
  • dc:language
  • dc:publisher
  • dc:rights
  • dc:source
  • dc:subject
  • dc:title
  • dc:relation
  • dc:type
 

 

This table reports on the actual implementation as it stood in March 2012. The code contains provisional classes for source documents and a logical contents view, but from the perspective of March 2013, these seem to be thoughts not fully formed. Similarly, we might have expected to see a bamboo:pages object type contained by bamboo:book and that would serve as a container for bamboo:page objects. Implementation lagged behind the evolving ideas around how to model BambooBooks.

Alternate bindings

Binding the Book Model to CMIS makes sense if 

  1. the source book object is obtained in pieces through distinct calls to some API, and 
  2. the user wishes to browse, inspect, or otherwise operate on parts of a book without necessarily committing to retrieving the whole book.

As it turns out, when we first began working with the HathiTrust API, we laboriously collected each book one page at a time. This generated a large number of hits on the HathiTrust API, which HathiTrust interpreted as harvesting. They recommended that we not retrieve books piecemeal, but as a whole packaged in a single archive. So, the first condition was not met for HathiTrust. Curiously, the first condition continued to hold for TEI encoded texts for which we obtained page images from a commercial source. The TEI for a book was retrieved as a single file, but the page images were not. There were no page images for Perseus texts; pages were created from a single text object based a convention for treated certain divisions in the text as "pages", which might give one the impression that we did not know much about classical texts. Of course, we were squeezing the Perseus classical texts into the pages construct because our client object viewer expected texts chunked as pages. That is, we assumed that second condition above always held.

But, suppose the second condition doesn't hold typically. Then, one might imagine a less chatty binding in which a book is modeled in the manifest and metadata of a content package.

  • No labels