Page Tree:

Child pages
  • Functional Definition of Collection Interoperability

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

Initial rough draft 11 Nov 2010. Reformatted 15 Nov 2010

Functional Definition of Collection Interoperability

This outline describes collection interoperability in terms of the functionality it supports. The outline also suggests objectives against which, as the Bamboo technical infrastructure is realized over time, we can assess the degree to which it supports collection interoperability.

As the top level of the outline 3 functions (Aggregation, Interaction with Distributed Collections & Content, and*Enrichment of Surrogates, Enrichment of Objects, Creation of New Derivatives{})* describing degrees of Collection Interoperability are identified and 2 functions (Mediation / Remediation and Object Integrity / Versioning) defining high-level classes of potential Collection Interoperability services are described. Each of these areas of functionality have prerequisites, e.g., additional lower-level functionality, services and standard/protocol implementation.See also the Europeana Data Model Primer ( for an interesting implementation that touches on a number of these kinds of issues.

1. Aggregation

Collections and items maintained in disparate and separately managed repositories will be gathered together into new Bamboo-specific arrangements and/or aggregations with which scholars can interact and over which scholarly services can be implemented. There are various classes of use cases here, each involving different degrees of aggregation, i.e., implying variously the aggregation of at least 3 different classes of aggregate:

  • Harvested Proxies: Metadata and/or other kinds of resource surrogates -- e.g., for new search and discovery services, etc. Users/services interact with these proxies which include links to distributed resources and collections held in external repositories.
  • Harvested Content: Copies of collections and/or items are gathered into Bamboo "owned" work spaces in order to support more sophisticated use cases that require access to content in aggregate.
  • Virtually Aggregated: the objects, collections and/or items remain distributed but are made to appear to user/service to be part of or contained within a locally Bamboo-defined cohesive collection made available through a single, consistent interface; a just-in-time model that supports more sophisticated services and use cases that require access to content in aggregate, but only at specific points in work flows (i.e., not all the time).

Aggregation functionality in turn depends on underlying functionality and adherence to standards:

1.1. Identifiers

Source (collection, repository) identifiers are essential to support initial harvesting of metadata and/or content objects, or to support maintained connections to content maintained remotely. Abstract collection or repository identifiers can be useful, as keys to human readable collection descriptions, but URLs by which to access collection/repository dissemination services will be essential (e.g., OAI provider baseURLs, RSS URLs, etc.).

Metadata record and item object identifiers will be required for harvesting materials for aggregations and to enable de-duplication, versioning, etc. When harvesting metadata and/or items, these identifiers can be local and opaque, though for compatibility with linked data, HTTP based URIs are preferred. However, when creating virtual aggregations of distributed content persistent HTTP URIs that de-reference to representations of resources will be essential (e.g., DOIs, Handles, OpenURLs, etc.).

Harvesting and aggregation of previously disparate metadata and/or content into Bamboo also implies a scheme or schemes for minting Bamboo-specific identifiers and provenance; harvest date plus original in-situ object identifier pairs then become referential only, unless active, ongoing real-time replication of outside repositories is contemplated.

1.2. Description

As appropriate to service and/or user need as determined by use cases supported. Some provider-supplied metadata, i.e., giving attributes of collections and resources that cannot be discerned automatically by inspection of the collection/resource. Equivalent / similar elements from multiple namespaces should be supported for core descriptive attributes, with required, required-if-applicable, recommended-if-applicable, and optional properties clearly defined. Properties that can be determine automatically from inspection of collections and/or items should be generated at time of aggregation and should be preferred to provider-supplied values for these attributes. We can also anticipate Bamboo-generated attributes. See, for example, Europeana Semantic Elements (ESE) set, or the more forward looking Europeana Data Model (EDM).

1.3. Initial Harvesting/Aggregation Protocols

Will vary by kind of resource being aggregated and anticipated frequency of update, new harvesting. Thus OAI-PMH and RSS for metadata aggregation; CMIS, SWORD2, JSR, etc. for resource harvesting. Replication protocols for virtual aggregations.

1.4. (re)Dissemination

Interfaces must be defined to support determining the state / ascertaining availability of aggregated (or virtually-aggregated) resources and for obtaining proxy and/or resource. Versioning protocols will be important here, e.g., Memento, etc.

1.5. Actionable Rights Ontology

Bamboo will need to know what can be done with each collection or object aggregated. At a minimum a simple set of rights classes should be adopted or adapted, e.g., from Creative Commons, or see ESE.

1.6. Monitoring use (transaction logging)

1.7. Illustrative Use Cases supported by Aggregation Functionality

1.7.1. –

2. Interaction with Distributed Collections & Content

A step up from simple virtual aggregations are virtual aggregations involving objects that are themselves, complex or compound, including compound objects having components distributed across disparate repositories (e.g., an article that has XHTML copy of text in repository A, TEI copy of text in repository B, Figures in repositories C & D, and data tables in repository E).

2.1. Identifiers

2.2. Description

2.3. Packaging Standards

I'm thinking here of METS, OAI-ORE, Bag-It, etc.

2.4. Actionable Rights Ontology

2.5. Monitoring use (transaction logging)

2.6. Illustrative Use Cases supported by Distributed Collections & Content Functionality

2.6.1. –

3. Enrichment of Surrogates, Enrichment of Objects, Creation of New Derivatives

We can anticipate that not all Bamboo interactions with content will be passive. Metadata will be enriched with new relationship information (e.g., to annotations and ontological tags). Objects may be enhanced (e.g., image enhancement). New derivatives will be created (transformed metadata, transformed / translated metadata or resource, part-of-speech annotated resource, text resources run through XTF-style processes to create Lucene full-text indexes to facilitate analyses, etc. One can imagine use cases whereby search and discovery of resources not managed by Bamboo is enhanced through services that Bamboo uses to generate new keywords or facets for resources. New derivatives and enhanced information may be maintained by Bamboo (in which case how tied to distributed resource is critical) or may be passed back to source repository for ingestion.

3.1. Identifiers

3.2. Description

3.3. Reposting / Bi-directional Replication Protocols

3.4. Derivative / Enrichment Ontology

3.5. Derivative Rights Ontology

3.6. Illustrative Use Cases supported by Enrichment / New Derivative Functionality

3.6.1. –

4. Mediation / Remediation

We can anticipate that Bamboo may provide facilities to allow content contained in disparate, distributed collections to interact with tools which are (to Bamboo) black boxes. Such actions may be relatively passive for Bamboo components, e.g., facilitate discovery / enumeration of objects to analyze and then pass these opaque (to Bamboo) objects and associated opaque (to Bamboo) metadata on to external tools, possibly receiving back results to return to source and/or user. Or in other instances Bamboo will remediate content (e.g., transform from flavor X TEI to flavor Y TEI) before passing on to tool.

4.1. Identifiers

4.2. Description

4.3. Provenance Protocols

4.4. Derivative Ingestion

4.5. Mediation Ontology

4.6. Illustrative Use Cases supported by Mediation / Remediation Functionality

4.6.1. –

5. Object Integrity / Versioning

While much of this will rightly be left to content provider repositories, interacting broadly with distributed collections and enriching or creating new content moves some of this beyond the individual repository boundary and into the realm of Collection Interoperability. When working with distributed resources not "owned" by Bamboo, it can be critical that scholars be able to count on the persist availability of "virtualized" resources they use. We can anticipate that Bamboo may provide facilities and services to help scholars verify the integrity over time of the resources they use and to track changes in (versioning of) these resources. This is closely related to digital preservation and so likely will take advantage of PREMIS metadata schema, format registries, and emerging protocols like Memento (which can facilitate identifying and even retrieving versions of Web resources that have changed).

5.1. Identifiers

5.2. Description

5.3. Provenance (PREMIS) Protocols

5.5. Checksum utilities, etc.

5.6. Illustrative Use Cases supported by Mediation / Remediation Functionality


This right-side column is an artifact of "old-style" navigation. Please remove sections, columns, and content of the right-side column when this page is next edited. Cf. the Wiki navigation changes - July 2011 page for a 'how-to' on removing old-style page markup.
  • No labels