This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

This document is a modestly-edited artifact of a bottom-up approach by technical team members to define the nature and scope of software development work to be proposed for a phase two of the Bamboo Technology Project. While this proposal was not ultimately funded, the Classics Capabilities Map embodied in the table that makes up most of this wiki page is included to open a technical planning angle on implementation of the goals and capabilities described in overview and other linked documents on the page Curation of Digital Materials - planning for future development. It identifies where in the Ecosystem architecture the technical team planned to implement each capability, and indicates capability that would have been supported by work completed in phase one of the Bamboo Technology Project.

Curation and Exploration in the Classics Use Case

Three use cases were analyzed and broken out into a set of capabilities listed in the leftmost column of the matrix below.

The three use cases (attached to this page as PDFs and linked in the list below) are:

For each capability, the matrix describes relevance to Classics scholarship, how the capability maps to Project Bamboo infrastructure, and reference implementation(s) by which Project Bamboo would demonstrate functionality and infrastructure integration.

In high overview, as the diagram shows, the ecosystem of scholar-facing tools and infrastructure includes:

  1. Independent and commercially-developed tools already in use by Classics scholars, such as the oXygen XML Editor and the Alpheios tool set.
  2. Son of Suda On-Line (SoSOL) as it has been evolved (and continues to evolve) for application to Perseus-held content as an environment in which annotations are applied to content and vetted/edited/approved in defined workflows modeled on classroom supervision and peer reviewed publication.
  3. A Bamboo-built Research Environment, implemented atop Drupal and Fedora, from which scholars will select materials on which to work locally, and in which scholars publish materials for collaborative sharing and/or vetting/approval and uptake by the annotated content's originating repository; publication in the RE context is of medium-term duration (see "Institutional repositories" below).
  4. The Bamboo Services Platform (a.k.a. "Shared Services Platform"), responsible for making available IAM, content access/interoperatbility, analytical, and transformation services
  5. External providers of content: content to be annotated (annotation targets) and/or annotations on content (annotation bodies). These include – in the Reference Implementation case, Perseus – the facility for uptake and incorporation of vetted Bamboo-curated content
  6. External providers of identity (campus and social media IdPs)
  7. Institutional repositories for long-term storage of scholarly data

Bamboo infrastructure leveraged in support of Classics use cases would be:

  • IAM services to handle institutional authentication by Shibboleth and/or authentication by Social Media identity providers; as well as basic group and role based authorization scenarios
  • Models for transmission of text objects (Text Object Model) and for recording and managing annotations and associated metadata (Scholarly Data Management), implemented in a Bamboo Research Environment (see below)
  • Content access (CI Hub/Repository), Vocabulary, and Transformation Services
  • Analytical services such as Morphology, Syntactic Annotation, Places-Texts and to-be-selected text-reuse identification services.
  • Utility services such as Notification and Caching

The Bamboo-built Research Environment would present a user interface for login and associated IAM function; content and catalog access; access to other services hosted on the Bamboo Services Platform; and some user-facing helper widgets (e.g., to present metadata terms from a vocabulary service). Most importantly, capability to store and serve content locally for use in scholarly workflows, using the to-be-developed Project Bamboo Text Object Model, as well as annotations and associated metadata stored and served in models created for the purpose of Scholarly Data Management, will be implemented on and accessible to scholars via the Research Environment.

SoSOL will be leveraged as the workflow engine for managing the curation process and serving as the middleware between the Drupal research environment and external annotation and resource editors like oXygen and Alpheios, etc. It would also be able to call services like the extant (phase one) Morphology Service or a Named Entity Identity service, etc., to automatically seed annotations for further manual curation.

It is noteworthy that the contemplated Reference Implementations include services that support "text reuse" use cases. Use cases of this type are of broad interest across domains of humanist scholarship, and will be easily understood by scholars outside the Classics as applicable to work on other corpora and periods of human culture.

Phase2Architecture


TBD: Describe teams and sequencing in the matrix below (Steve, 16 Aug 2012)


The Teams involved... column should, in the first instance, be filled in with the names of functional teams. Which institutions and individuals staff those teams will emerge from discussions between potential partners to this work. Below is a list of functional teams to be used to populate this column. As of 16 Aug 2012, this is a WIP – please edit/augment/suggest!

  • BSP-core: BSP-hosted core/utilities services
  • Schol-Svcs: BSP-hosted Scholarly Services (may be several sub-teams here)
  • CI-Hub: BSP-hosted CI Hub
  • IAM: Identity and Access Management
  • RE-UX: Research Environment (UI/UX/Drupal focus)
  • RE-Store: Research Environment (Object Store/Fedora focus)
  • SDM: Scholarly Data Modeling
  • Text-Obj: Text Object Modeling
  • Classics-RI: Classics RI team (SoSOL, tool integration focus)
  • Operations: Operations (sys admin, load testing, etc.)

IMO anything with an RI component to it (a.k.a. "everything") implies a functional user team engaged to use what Project Bamboo delivers in its iterative dev cycles.

Capability

Relevance to Classics scholarship

Relation/Map to PB Infrastructure

P1 Support? ((tick) /(error) )

Implied P2 Infrastructure Extension / Development

Reference Implementation(s) to demonstrate Infrastructure Support of this Capability

Notes on sequencing and resources

Identify User

Need to associate a user with their annotations.

BSP Service: centrally-hosted

Y

Keep IAM components current in a changing technology space.

SoSOL or Research Environment to be Shibboleth SP to authenticate via Institution or Social IdP

 

Authenticate User

Allows identification within an institutional context and association with a role.

RE Function: client responsibility

Y

Fine-grained attributes from institutions are NOT part of phase 2; Local attributes from research environments may be available.

SoSOL or Research Environment to be Shibboleth SP to authenticate via Institution or Social IdP

 

Identify User Roles

Enables fine-grained access to institutional resources; enables use of roles within a workflow to support chain of authority.

BSP Service: centrally-hosted (clients may also manage roles, including inquiry on BSP-managed group affiliations)

Y

Use of institutionally-managed roles (e.g., to enable access to institutional resources) is out of  scope for Phase 2; use of simple institutional affiliation, and roles within a workflow, is provided in P1;

SoSOL to use roles asserted by Bamboo Group Service

 

Authorize Access to Text Resource

Allows us to make resources which have access restrictions available for annotation. E.g. limit access to classroom resource.

BSP Service: centrally-hosted (clients may also inquire on BSP-determined policy decisions)

Y

P1 Yes if in local repository, No if in CI HUB; P2 would be to extend AuthZ to CI HUB;

Research Environment to serve as policy enforcement point  to restrict access to resources.    Open Question: to what extent does policy enforcement need to fall through to SoSOL? To restrain scope, we may need to support only limited curation for access restricted resources (e.g. where resource is not kept with annotation via the content package) in P2.

 

Synchronize group memberships across the ecosystem

Enables use of roles to support fine grained access to annotations.

BSP Service: centrally-hosted "master" store of groups

RE Function: client responsible for synchronizing local group management with "master"

Y

Will have for Drupal at end of P1;

Additional research environment integration may be undertaken in P2 (?)

SoSOL to be able to call Grouper API

 

Acquire Resources for Annotation

Need the ability to provide generalized access to annotatable resources from an open-ended set of providers whose content adheres to a set of standard formats (EpiDoc, TEI-Analytics, ...) served by a set of standard APIs (CTS, ..)

BSP Service: centrally-hosted CI Hub for selected repositories

RE Function: client responsible for file-upload capabilities

Y

P1 support in RE is file-upload and from repositories for which there are CI Hub adapters;

P2 build-out: facilitate support for repositories for specific additional repository types, such as CTS; Eventually we should be able to retrieve annotations themselves as first class resources for annotation.

Use Perseus FRBR Catalog (Drupal XC interface)  + SoSOL + CIHUB + Syntactic Annotation Service to provide browse interface for resources available for annotation; SoSOL provides passage-level retrieval via CTS support;  Reference implementation will draw resources from CIHUB and Linked Open Resources (including Google Books, WorldCat, et al.)

Initial Ref Impl iteration may be P1 repositories only but later iterations (i.e. once supported text/object models and APIs are documented) should demonstrate ability to retrieve from other providers which adhere to those standards.

Serve Text Resources Available for Annotation to User

Users need to be able to see the target of their annotation.

BSP Service: centrally-hosted

Y

Yes for P1 repositories and for repositories supported by the P1 Syntactic Annotation service;

P2 build out is reconciling the 2 approaches and extending.

Extend P1 Drupal interface ("repository browser") for resource display;

See also Acquire Resources for Annotation.

Identify Annotation Target in Text Resource

Need to be able to identify the following types of targets for annotation:  named entities, parallel text, syntactic structures (e.g. words and sentences); citations; scholia; bibliographic metadata; 

BSP Service: centrally-hosted presentation of annotation targets in text object model

RE Function: client responsible for presenting content in text object model to users (UI)

Integrated Tools: domain-specific tools responsible for consuming and displaying annotation targets in a UI appropriate to scholarly workflow

N

P2 text object modelling; Curation at a distance - identify likely targets based on data/training. Would like to create an annotation as a first-class object (that can, itself, be annotated).

Initial iteration extends P1 demonstration for retrieving and preparing texts identified by citation for syntactic annotation to support CTS via the v 1.1.0 Syntactic Annotation Service.

Subsequent iterations support retrieval of additional targets  by leveraging FRBR metadata exposed as triples to identify, e.g. parallel texts for translation alignment, digital images for OCR; related resources such as scholia and commentary.

Final iteration must employ automated methods for target identification that leverage existing annotations based upon training (maybe markup projection and/or named entity identification or text reuse)

[removed from archive version: link to teleconference notes]

Serve Controlled Vocabularies/Annotation Properties

Need to be able to classify annotations according to various standard ontologies. Examples include FRBRoo, CIDOC, CTS verbs, ...

BSP Service: centrally-hosted

N

P2 model and build (or adopt/adapt) a vocabulary service to serve namespaced vocabularies, etc.

Drupal widget which calls the BSP vocab service; Widget launched from SoSOL at the point at which an annotation is saved or submitted to allow the user to populate some metadata fields on the annotation from available vocabularies.

[removed from archive version: link to teleconference notes]

Serve Available Annotation Body Resources

Need to be able to draw on various sources of annotation resources, many of which identify linkable URIs as stable identifiers, including Gazetteers like Pleiades, CTS Inventories, Prospographical Resources, etc.

BSP Service: centrally-hosted service of annotation body resources in text object model

RE Function: client responsible for presenting resources in text object model to users (UI)

Integrated Tools: domain-specific tools responsible for consuming annotation body resources in a UI appropriate to scholarly workflow

N

P2: settle on a representation (OAC) and ability to ingest resources that comply with the representation; provide stable URIs for annotations that don't have them (limited media types; note distinction between "stable" and "permanent"); (P1 Syntactic Annotation service has a start on this; P2 decide how to build out)

Drupal widget to provide browse/selection interface for selected annotation body resources; Widget launched from SoSOL at the point at which an annotation is created/edited.

 

Store Annotation

Need to be able to save work-in-progress on annotations.

RE Function: client responsibility

Integrated Tools: responsible for storing W.I.P. to point where scholar wishes to 'promote' to RE store. (?)

N

P2: provide local store, and see serve annotations;

Use git via SoSOL and Fedora via RE. SoSOL to RE is a 'publication event' of work in progress that may be ready for some level of review or collaborative sharing. Technically this would be a similar publication event to an institutional Fedora repository.

 

Store Annotation State/History

Need to be able to provide a curation log to support eventual chain of authority and annotation filtering.
See also 'Store Annotation'.

RE Function: client responsibility

Integrated Tools: Record W.I.P. and package for "promotion" to RE store.

N

P2: associate annotation state and history with annotation; see also store and serve annotations and provenance

Use git via SoSOL and Fedora via RE. SoSOL to RE is a 'publication event' of work in progress that may be ready for some level of review or collaborative sharing. Technically this would be a similar publication event to an institutional Fedora repository.

Provision of curation log (record and make available) is in P2 scope. Mining and visualizing, etc. is not in P2 scope.

Associate Annotation with Specific Version of Annotation Target/Resource

Resources being annotated are rarely static (e.g. a text may be corrected for typos or ocr scanning errors, underlying markup structure may change, etc.).

RE Function: text object model implemented in RE associates annotation to version of target.

Integrated Tools: this is a domain specific client responsibility.

N

P2: text object model; Scholarly Data Management (SDM = describing metadata on and relationships between objects); versioning. Note that this does NOT include the ability to update the resource and automatically patch existing annotations onto the updated resource.

In the RI we will provide one example of domain-specific support for this in SoSOL for CTS enabled texts. Users will be able to select a passage (citation) for editing, and by leveraging the content package which contains the full resource and the text inventory, SoSOL will merge the edited passage back into the full text at the point at which the change is finalized (i.e. approved and commited back to the master SoSOL repository). If unable to merge automatically, the SoSOL interface offers manual edit/ review and an option to preserve the passage separately if necessary. Note that this support is limited to the scope of  curation within the Bamboo/SoSOL platform but the goal is to at least to demonstrate one approach to handling this problem which may be able to serve as a blueprint for others.

 

Aggregate annotations and related resources

See 'Associate Annotation with Specific Version of Annotation Target/Resource'; Also supports creation of digital editions;

RE Function: client responsibility to package content

N

P2: text object and content package modeling

Content package which preserves annotation, target resources, and relevant metadata (e.g. citation structure via text inventory, etc.)

 

Serve Annotation History

Need to be able to provide a curation log to support eventual chain of authority and annotation filtering.

RE Function: client responsibility

N

P2: scholarly database management model/implementation (inspector/curation log); support for ePortfolios (??). History should be available as RSS (or equivalent) feeds.

Expose API to existing SoSOL feature. Provide Drupal interface for viewing.

Provision of curation log (record and make available) is in P2 scope. Mining and visualizing, etc. is not in P2 scope. (as noted above)

Deliver Annotations to an Annotation Server (Text Resource Provider or Institutional Repository)

Supports use of annotations to curate resources (both manually and via automated methods) and to augment domain-specific knowledge bases.

BSP Service: centrally-hosted CI Hub responsible for serving packaged content

N

P2: Bi-directional collection adapters and/or supporting data model/api; content package modeling. Likely to be a classic publish-subscribe design.

Develop a service at Perseus which subscribes to filtered annotations on its resources and makes them available in the main Perseus interface to that resource. May be either as an inline correction, or as stand-off markup.

Perseus service is an RI on the Perseus repository that would apply or be extensible to other domains, e.g., domain-specific logic to ingest into a Fedora repository.

Deliver annotated resource to a content repository (Text Resource Provider or Institutional Repository)

Supports submission of an annotated text (where annotations may be in-line corrections and/or stand-off annotations) for ingestion by either: a repository that provided the resource that has been annotated; or a repository in which an altered/augmented derivative will be stored and/or published.

BSP Service: centrally-hosted CI Hub responsible for serving packaged content

N

P2: Bi-directional collection adapters and/or supporting data model/api; content package modeling. Likely to be a classic publish-subscribe design.

Develop a service at Perseus which subscribes to annotated 'editions' (or 'versions') of its resources and makes them available in the main Perseus interface to that resource. Annotations on the resource may be either as an inline correction, or as stand-off markup.

Perseus service is an RI on the Perseus repository that would apply or be extensible to other domains, e.g., domain-specific logic to ingest into a Fedora repository.

Authorize Access to Annotation Resources

Supports pedagogical applications (e.g. for assessment); Enables user-ownership of their scholarship. Supports ePortfolios.

RE Function: client responsibility

BSP Service: offers policy decision service to clients that wish to utilize it

Y/N

P1: ACL on RE instances is a Phase One deliverable.

P2: modeling text objects, annotations, and scholarly data objects will define the objects to which ACL will be applied.

Drupal interface to enable user to apply an access control policy to their annotations which enables/restricts their parent institution's ability to pull them to the institutional repository for storage and inclusion in an ePortfolio. Apply access restrictions to publish/subscribe model.

Develop a service at Perseus which subscribes to annotations obeying this policy and imports them to the institutional repository or LMS.

See note in previous row, Deliver Annotations to an Annotation Server...

Preserve Annotation Provenance

Enables annotation filtering based on authority; Supports assessment of  scholarship over space and time (knowledge acquisition)

RE Function: client responsibility to store provenance.

BSP Service(s): contributes provenance data (including stable user identifiers, service/tool metadata, etc.) to provenance records.

Y/N

P1: Items in RE storage are identified as to creator, creation time, updater, updated time; Morphology Service records engine that generated morphological analysis.

P2: Enhance provenance description with modeling (of scholarly data) that describe tools, setting, et al. that are germane to recording provenance for a scholarly process that involves multiple authors and algorithmic processes, as well as relevant timestamps, and relevant relationships to other annotations.

See above under Deliver and Authorize Access to Annotations.  The services implemented should leverage annotation provenance to decide what to do with the annotations (e.g. only pull and integrate annotations which meet a certain criteria applied to their provenance)

SoSOL must be modified to apply the appropriate SDM model to recorded annotations.

See note in prior row, Deliver Annotations to an Annotation Server...

Expose annotations for a given annotation target

Enables annotation filtering based on resource. (e.g. all annotations for a given text/author/work/citation; all annotations of a specific named entity; etc.)

(Note: CTS supports identification of span that crosses citation boundaries)

RE Function: client responsibility to store scholarly data objects.

BSP Service: centrally-hosted CI Hub responsible for serving packaged content

N

P2: Provide an API that would permit an annotation server to retrieve (filtered) annotations on (filtered) resources of interest. Model the orchestration of obtaining text, obtaining annotations on text, applying annotations to text, and serving annotated text. Modeling could include both annotations that change text and annotations that sit next-to-text; this is a scoping question.

See above under Deliver and Authorize Access to Annotations.  The services implemented should leverage this capability to pull and apply annotations to selected resources. Reference implementation will require and support CTS URN for annotation targets. Anything else may be out of scope.

See note in prior row, Deliver Annotations to an Annotation Server...

Generate Annotation Automatically

Manual curation is not scalable. Must be able to support automatic analyses that then can be curated (either automatically or manually).

BSP Service(s): centrally-hosted

Integrated Tools: domain-specific tools may contribute automated annotation capability

Y

P1: morphology service capability and places and texts

P2 need to expand to multiple engines and annotation types;

Possible P2 implementations:
Morphology Service and/or Annotation Service Extension to support MST Parser, MALT and Anna for automatic annotation of syntax/dependency structure.

MGIZA -based service for automatic translation alignment; Proteus service for named entity identification?

A service that supports text reuse (e.g., eTraces or Proteus) is a likely candidate here because it would be easily applicable to domains beyond Classics scholarship.

[removed from archive version: link to teleconference notes]

 Cf. text reuse use case linked above on this page. Research into services appropriate for investment is a TBD.

Note: there's a proposal to Don Waters re: Proteus in the pipeline, from Northeastern and U Mass. Not sure when. Worth knowing when considering whether to cite in Prospectus/Proposal.

Provide GUI for creating/editing Annotations

Must be able to support specialized user interfaces per annotation type; GUI preferences are often personal and idiosyncratic; Rapid pace of change in GUI development requires a platform that can integrate and adapt.  Examples of some GUIs specific to classical scholarship: SoSOL, WiSSKi, TPEN, oXygen, Alpheios, ...

Not in scope for Bamboo P1/P2

N

P2: Bamboo doesn't do GUI but enables integration of one or more GUIs

SoSOL already offers export of resources for working in local or web based editors plus minimal text entry interface for working with text and annotations. Extend this to provide an RESTful API CRUDL access.

The RI will demonstrate this with the Alpheios tools for syntactic annotation and text alignment.

 

Notify User of Annotation State Changes

Feedback is essential to to encouraging continued user engagement and contribution;

RE Function: monitor events and invoke notification service(s) appropriately

BSP Service: centrally-hosted notification management

Other repositories: responsible for exercising appropriate notification APIs

Y/N

P1: Centrally-hosted notification service permits any service to send e-mail notices upon occurrence of any event.

P2: Build a service that notices changes on the RE's store and constructs & sends notifications of those events; build a service that permits an Annotation Server to notify a Bamboo RE that a set of annotations has been accepted and the subscribers to those annotations are to be notified.

Cf. Serve Annotation History above re: notifications by means other than e-mail.

SoSOL already supports email notifications of various workflow state changes. Need to explore ways of integrating this with Bamboo Notification Services, maybe for extending notifications to Bamboo users not directly associated with the annotation but within the user's Bamboo group?
 
Also, build a hook in the Perseus service described under Deliver Annotation Resources that calls a Bamboo service to notify users of incorporation of their annotation in to Perseus.

NOTE: Change in state of annotations can include "storage about to expire" -- so people can be notified they need to put their work elsewhere if it hasn't been taken up (for example) by the annotation target's original repository.

Notify User of Resource State Changes

Feedback is essential to to encouraging continued user engagement and contribution;

Same as prior row.

Y/N

Same as prior row.

Same as prior row.

SJM (8 Aug): same as prior row

Perform Transformations (various)

Must be able to support multiple standard formats for resources (texts and annotations).

BSP Service(s): centrally-hosted

Y/N

P1: Syntactic Annotation Service; CI Hub transformation of content to Book Model.

P2: CI Hub transformation to Text Object Model; extending Pepper Modules to support additional annotation types.

Additional transformations may come into play, to be discovered. Exploration of relevance of pipelining support built into platform (Apache Camel) is also to be explored/discovered.

Exercised by implementations for Acquire and Serve Target Resources; Also may be exercised by Deliver Annotation Resources (e.g. case where a syntactic annotation is made and preserved by the RE in Format X and we leverage the Annotation Service and Pepper Modules to transform to Perseus format for import into the Alpheios/Perseus treebank repository.

NOTE: There may well be IP (rights) issues associated with production of derivatives through transformation. What Bamboo's role is with respect to enforcing or recording such rights/issues is TBD.

  • No labels