Scheduled DB Maintenance: January 21st - 8:00 AM to 10:00 AM. Confluence will be unavailable during this time.

Navigation:
Documentation
Archive



Page Tree:

Child pages
  • Curation workflows, capabilities and integration patterns

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

The description of curation workflows were proposed by Martin Mueller and Greg Crane, with participation of Robert Morrissey, Neil Fraistat, and Trevor Muñoz. They were completed on 12 Aug 2011, with minor amendment on 15 Aug 2011. These workflows are an analysis and 'remediation' of an essay written by Martin Mueller describing curation of Early Modern English texts.

The discussion of integration patterns and implied capabilities that follows was developed by Bob Taylor, Bruce Barton, Steve Masover, Tim Cole and Travis Brown, as part of planning for a second phase of technical development, in early September 2011.

 

On this page:

Introduction

 

This recipe lets contributors collaborate on curation of digital surrogates of printed books. These digital surrogates have been mass-produced either through re-keying or optical character recognition (OCR) and the transcriptions are often imperfect or incomplete, sometimes both. The curation tasks in this workflow allow contributors to:

  • identify and merge texts which represent versions or editions of the same work
  • supply missing characters, words or passages of text
  • emend characters or passages that have been incorrectly transcribed
  • identify and describe structural and linguistic features of texts in machine-actionable ways to connect them to other texts
  • add longer, free-form annotations

Completion or correction tasks involve direct changes to published data which are reviewed and incorporated by contributors with appropriate qualifications. Other kinds of annotations may be incorporated into the published data but may also remain private to individual researchers.

Allowing contributors to collaboratively curate imperfectly or incompletely transcribed old books helps researchers by raising the quality of digital surrogates to a level that will gain acceptance by scholarly communities while reducing the time cost of curatorial labor to any individual contributor. A well-designed work flow directs the crowd, or more accurately individuals in the crowd, to problems that need doing and match their skills. This reduces the time cost of getting to a particular problem, focuses the attention of curators on the task that requires human intervention, but then relieves them of the burden of keeping track of what they have done and makes the system do the work of reporting the curatorial act.

Workflow description

A. Collection curation (metadata management)

  1. User investment: high; authentication req.; ongoing
  2. Workflow: separate curation process (on collection)
  3. Tools:
    1. forms for editing/adding bibliographical metadata
    2. alignment tool for indicating that text represent different manifestations of a single work
    3. metadata reconciliation tool (correcting discrepancies between aligned text metadata)
    4. tools for searching and selecting terms from controlled vocabularies
    5. ontology tools for managing the relationships between works, expressions, manifestations, and items (FRBR Group 1 entities)
  4. Steps:
    1. The contributor searches or browses the repository and discovers multiple results that appear to relate to the same work.
    2. The contributor uses ontology-backed alignment tool to group duplicate entries
    3. Contributor may also examine and correct metadata for aligned texts, e.g., replacing variant spellings of author’s name with authorized heading from an authority file. Different metadata elements may be available depending on a text’s position in ontology: Could have multiple valid publishers and publication dates for multiple manifestations.
    4. A contributor with appropriate privileges is made aware of the proposed alignment:
      1. through sorting the curation log by type of action
      2. by observing flags in the list of texts that appear in browsing or searching views
    5. Depending on the contributor’s reputation (a “score” based on previous curation actions or whitelisting by editorial board, etc.):
      1. The alignment of texts could be approved immediately
      2. The proposed alignment could be flagged for community review

B. Intermittent/En passant curation

  1. User investment: varies; ideally authentication wouldn’t be required for the most basic actions such as flagging an error (without correction) to capture the most casual users. Actual emendations may require authentication since these may alter readings of the text and thus its perceived scholarly value.
  2. Workflow: minimal additions to / distractions from reading interface
  3. Tools:
    1. split-screen facsimile / transcription viewer;
    2. simple editing interface for transcription;
    3. moderation system for:
      1. logging curatorial acts;
      2. approving or rejecting emendations;
      3. reviewing authenticated users’ performance
  4. Content:
    1. Bamboo Book Model: minimal assumptions about lower-level representation of the content
  5. Steps:
    1. While viewing a text, a contributor comes across an error in the transcription
    2. Anonymous contributors (not authenticated) can simply flag the appropriate line (or word) for review
    3. Contributors who have authenticated can emend the transcription
    4. Depending on a contributor’s previous actions, the proposed emendation:
      1. can be immediately incorporated into the proposed transcription and the curatorial act is recorded in the curation log, noting the user who made the change, the type of curation action, and the timestamp
      2. can be kept private and sent for review by contributors with more permissions
    5. Contributors with moderation privileges can:
      1. Approve the proposed emendation; approval of emendations triggers an entry in the curation log and increases the “reputation score” of the original contributor
      2. Reject the proposed emendation; the original contributor’s score does not increase
      3. Create a fork indicating two valid variants of a section of text
    6. Reviewing emendations increases the reputation score of moderators

C. Cascading curation

  1. User investment: medium; authentication req. but possibly single-event
  2. Workflow: takes user out of the reading interface into curation process
  3. Tools:
    1. encoding normalization tool;
    2. tools for tokenization and linguistic annotation;
    3. search index;
    4. orthographic / PoS query interface;
    5. keyword-in-context (KWiC) result viewer;
    6. batch editor;
    7. moderation system
  4. Content:
    1. Bamboo Book Model: but content must either be normalized, tokenized, and tagged in advance, or able to have these operations performed on it when the user begins the task
  5. Steps:
    1. While reading a text (and possibly offering corrections en passant), an authenticated contributor notices a pattern of errors (e.g., fhall for shall or tlie for the)
    2. The user clicks a link to enter a “batch correction” interface
    3. In this interface the user is able to search for instances of the error pattern across the corpus he or she is working with
    4. The results of this search are given in a paged KWiC list
    5. The user specifies a correction and selects a subset of the results in the list to batch correct
    6. The batch correction is submitted for immediate incorporation or review according to the same process as above for intermittent curation

D. Connection curation (sequential)

  1. User investment: high; authentication req.; ongoing, multi-event task
  2. Workflow: separate curation process
  3. Tools:
    1. split-screen facsimile / transcription viewer;
    2. more advanced interface for editing, which may include:
      1. keyboard shortcuts;
      2. auto-completion (including tag-completion) based on previous edits;
      3. suggestions based on data from curation log
    3. Editing or correction interface for more advanced encodings:
      1. mapping surface spellings to standardized lemmata and parts of speech,
      2. identifying multi-word expressions,
      3. determining sentence boundaries,
      4. identifying passages written in other languages,
      5. identifying names of people and places,
      6. identifying citations (e.g. Biblical references),
      7. identifying longer quotations or passages that are shared across different texts,
      8. identifying passages marked as spoken in written texts,
      9. marking structural boundaries of chapters, scenes, and other subdivisions of texts
    4. Interface for selecting best page images from multiple candidates
    5. moderation system
  4. Content:
    1. Bamboo Book Model: content may be normalized, tokenized, and tagged in advance, but the user will often wish to customize these preparatory steps
  5. Steps:
    1. While reading a text, an extension to the editing interface allows the user to enter richer kinds of annotations or markup. The details of this interface are dependent on the schemas the project is using and the kinds of tasks that are supported

Recipe for curation of full-text transcriptions of books

Scope

This recipe covers curation of full-text transcriptions of books whether transcribed by manual re-keying or OCR. This workflow does not address the re-integration of curated texts with the holdings of contributing repositories or the long-term storage of materials after that point. Contributing back curated data is an area where further work is required.

Pre-processing

Texts will be turned tokenized and tagged with PoS upon demand. If a contributor works on a text that has not been previously curated, pre-processing will occur. If a text has been previously-curated, tokenization and PoS tagging will not happen again, instead, the prepared text will be served to the curation interface.

Interface

The workflows described in this document will require a variety of interfaces:

  • the basic reading/viewing interface with checkboxes or other minimally-distracting mechanisms that allow contributors to “flag” a line as requiring correction or completion
  • a list interface perhaps based on search or browse interfaces with drag and drop capability to allow contributors to indicate linkages between duplicate entries. Mechanisms for labeling with terms from the ontology could use drop-down boxes or similar
  • a tabular interface similar to Freebase or Google Refine for reconciling metadata about aligned texts
  • Editable text boxes for supplying emendations or completions in a modified version in a side-by-side view that presents page images and transcriptions
  • A gallery like view that allows contributors to see page images and transcriptions in a matrix and select the highest-quality version
  • A batch editing view that allows contributors to search for recurring errors, see them in a keyword-in-context view and make corrections to many locations in the text at once
  • A more complex curation view for more substantial emendation and annotation, which may include keyboard shortcuts, auto-complete (including tag completion) based on data from curation log and customized markup schemas. The basic model for this involves selecting a range of text and selecting from a number of preset options (tags, gazetteer entries) to annotate that range.

The Curation Log

The key to quality assurance lies in some basic features of modern computers that are very familiar to IT professional but are still likely to strike ordinary scholars as quite magical. It is easy for computers to keep track of thousands of users and millions of curatorial acts they perform, logging each one carefully as who did what and when. It is equally possible for such computers to divide a large corpus into its individual words and treat each word as a distinct token with its unique ID. This can be done with corpora running into billions of words.

From the computer’s perspective, the engagement of a user with a piece of text is a transaction that results in a log entry to the effect that at a particular moment in time a userID with certain properties changed, deleted, or added properties associated with one or more wordIDs . Algorithmically produced curation can also be logged in this fashion. The userID in that case is that of a machine running an algorithm.  The record of such transactions is a curation log that may run into many millions of records.  Think of a vast digital expansion of the multivolume Berichtigungsliste or “correction list” that Greek papyrologists have kept of their editorial work for almost a century.

The curation log is thus the fundamental management tool for maintaining quality control. It can in principle support quite different organizational choices for editorial review, whether highly centralized or widely distributed. In practice, it is likely that the best results will be found in distributed systems that give substantial control over editorial decisions to existing scholarly “data communities” in various scholarly societies and their sub-committees or interest groups.  There are substantial technical problems that need to be solved in order to main a robust and flexible infrastructure that will enable such data communities to work independently while staying in touch. Such a system might be backed by a system like git to avoid reimplementing parts of this functionality.

Integration patterns

Integration here refers to the presentation and coordination of functionality to the scholar. We describe several patterns that vary along a scale ranging from a highly integrated but narrowly specialized curation application (Workbench Pattern), to a bundle of applications largely selected by the individual scholar but including a specialized tool for creating, storing, and retrieving annotations (Notebook Pattern), and on to infrastructure for managing the information used in and the scholarly artifacts produced through curation while placing the responsibility to identify suitable curation tools entirely with scholars (Plumbing Pattern). See below for a fuller description of each pattern and notes about their strengths and weaknesses.

Capabilities

In our discussion of curation workflows we will occasionally refer to a capability needed to support a curation step. Later in this document we list general capabilities of a text curation application realized as some combination of tools and services along with notes about how these might be realized or supported in Bamboo's infrastructure.. These include but are not limited to annotation and related processes, document management, permissions management. Some capabilities like "document editing" are not explicitly listed. Our aim is to call out those capabilities that are assumed but not described in the workflows below.

Impact of integration patterns on curation workflows

In all workflows we assume a curatorial review process in which the reporter's reputation plays a role in the evaluation of candidate error and acceptance of proposed corrections. Of course, for "errors" we should read "annotations" of whatever kind the curation process is managing.

Intermittent curation

In the Workbench Pattern, a specialized book viewer application allows a suitably privileged reader who spots a suspected error in a page transcript to verify immediately that the error has occurred by consulting the facsimile image of the page and to flag the error or "make" the correction in the transcript. The correction is posted in an annotation service, and curators watching the annotation service evaluate the annotation and make the correction.

In the Notebook Pattern, a reader is working with a document with some level of scholarly adornment using tools appropriate to the task at hand, when he spots a suspected error. He may or may not be able to verify that error in the transcript underlying the document. Still, he records an annotation in the Scholarly Notebook he keeps open on his desktop, using an addressing scheme shared between the local tools and the Scholarly Notebook. The annotation is posted in an annotation service. Curators watching the annotation service evaluate the annotation and make the correction.

In the Plumbing Pattern developers of specialized scholarly text tools have added functionality to their tools that report errors to the annotation service as these are identified. Such tools have a "mark as possible error" function. As in the Notebook Pattern, curators watch the annotation service.

Cascading curation

The Workbench application in intermittent curation, the specialized book viewer, has been extended to invoke the analytical tools used in cascading curation. These tools include keyword-in-context searching for error candidates and automated candidate error detection.

In cascading curation in the Notebook Pattern, the scholar would use his or her own local text search or concordance tools (or the search capabilities of the source collection) to identify "batches" of similar errors. It is not desirable to require the user to copy and paste every the address and correction for every error individually in a large batch correction task, so cascading curation in the Notebook Pattern would rely on support in the local tools for harvesting addresses from search results.

In the Plumbing Pattern, the developer of a corpus query or concordance tool would add functionality to their tool to submit batches of correction annotations to the Bamboo annotation services, which would be monitored by curators as in the other workflows.

Connection (sequential) curation

In the Workbench Pattern the specialized book viewer would provide facilities (possibly through a plugin architecture) that would support specific kinds of more advanced annotation (identification of multi-word expressions, named entities, quotations, etc.).

In the Notebook Pattern, the scholar would annotate the phenomena of interest (such as named entities) with his or her chosen tools. These may include automated natural language processing tools or an XML editor such as oXygen. The Scholar's Notebook application would read the edited document and submit the annotations to the Bamboo annotation service.

In the Plumbing Pattern, the developer of an annotation tool (which may be an automated NLP tool or an interface that allows users to create annotations, for example) would incorporate functionality that would allow users to submit annotations directly to the Bamboo annotation service.

Collection curation

In the Workbench Pattern a specialized version of a Work Space local collection management tool would allow users to provide corrections to text metadata, indicate duplicates, or create links to an external ontology or authority file.

A simplified form of collection curation in the Scholarly Notebook Pattern might look very similar to intermittent curation: the user notices a metadata error while working with a text and enters the correction in the Notebook application. More advanced forms of collection curation (duplicate identification and reconciliation to external ontologies) would present challenges for this pattern.

In the Plumbing Pattern the developer of a tool such as Eighteenth-Century Book Tracker would add functionality that would allow the user to submit metadata corrections and links to external ontologies or authorities directly to the Bamboo annotation service.

Hard problems common to all workflows

As we plan our approach to implementing support for these workflows, we should also decide at what level we want tackle hard problems. Are we looking to build a perfectly general solution to a problem? Are we looking to a demonstration of one or more approaches to the problem? We list here several of the challenges that have become apparent in our discussions around these workflows.

Addressability and Robust Intra-document linking.

See Mike Witmore's blog post on massively addressable texts and Phelps and Wilensky on Robust Intra-document Locations.

When texts are stable, it is practical to address sections of text, e.g. a word, a page, a chapter, at any level of scale, as Mike Witmore says. Texts being curated are by definition unstable. Methods for pointing into a document to particular locations degrade in performance, as Phelps and Wilensky note, as the document changes. Yet, curation requires precision, especially if the work is done collectively where, for example, changes are suggested by citizen curators working at a distance and approved by others at a later time.

Similarly, annotation through reference to points in a text as in the curation by linked reference pattern imply rigid designation to fixed points in a text however much the context around that point may shift.

Persistence and publication of the scholarly artifacts of curation.

In a riff on "you break it, you buy it," "you fix it, you own it." But, Bamboo does not intend to become the alternative repository for high quality texts. In addition to clean texts, curatorial scholarship produces annotations expressing unresolved assertions about variant readings and so on. Where should the enriched texts and scholarly apparatus live?

Grading candidate annotations by correctness and utility

See Schmitz, Patrick. The CONCUR framework for community maintenance of curated resources.

Here we mean grading as in sorting apples by quality and size. As the volume of candidate error corrections grows — as we hope it would with citizen curation — we anticipate the need to sort candidates algorithmically into priority queues for review and acceptance. We list reputation management among the capabilities implied in curation. Reputation is one dimension one might consider. Utility, however that is defined, is another. How much effort should we expend on tuning queue management, and how generalizable is this, given that utility is specific to a domain?

Integration patterns discussion

Workbench Pattern or reaching for emacs

In this pattern all functionality in the curation workflow is available within a single application. The scholar is reading a document to curate it or for some other purpose, notices an error, pops up the facsimile, annotates the error and the proposed correction, returns to reading .... An curator is just someone who, using the same tool, can review proposed corrections (along with citizen curator reputations) and can accept the correction. The tool writes to the curation log.

The workbench calls the backing services that make collaboration possible.

With respect to user experience this is tight integration. The scholar is given little choice in which tools to use. The tool is likely limited to a narrow range of curatorial functions.

Strengths

A single application presents a rich environment in which to provide user interfaces well-tuned to specific curation tasks. For a user who finds this single environment well-suited to her task, it is convenient to perform the full range of curation tasks within a single, familiar context. Writers use environments of this type when they compose using a fully-featured word processing program; software developers are familiar with environments of this kind if they are frequent users of Integrated Development Environments (IDEs).

Weaknesses

It is difficult (and therefore expensive) to generalize an environment to support a variety of tasks and functions. It is even more difficult to do so while maintaining a low 'barrier to entry' for those who must become familiar with an environment in order to use it.

A 'deep and narrow' application suited to operate on a single corpus obtained from a single repository, and to fully support a tightly constrained set of simple curation tasks (e.g., suggest corrections to algorithmically-generated OCR), would be more easily realized than one suited to operate on multiple corpora pulled from multiple repositories in the service of a variety of simple and complex curation tasks.

To implement a generalizable environment 'from scratch' would be far beyond Project Bamboo's means (we are not equipped to build applications as complex as MS-Word or Eclipse).

To realize a 'single application' as a set of plug-ins or widgets that can be deployed to a container – such as a Work Space platform – begins to bleed into other integration patterns: particularly, a "Scholar's Notebook Pattern," as described below, with some 'under the hood' benefit realized by running diverse components in a single application framework. However, these 'benefits' would not deliver on the substantive promise of a single application: the 'seamless' user experience of a cohesively designed, fully-implemented, soup-to-nuts curation environment. As Martin Mueller put it in his essay Collaboratively Curating Early Modern English Texts, "It is a challenging task for interface designer to build an environment that supports 'curation en passant.'"

The sum of these constraints suggest that achievable results that follow this pattern are likey to promise more than Project Bamboo can actually deliver.

Scholar's Notebook Pattern

In this pattern, the user goes about her scholarly business using best-of-breed tools (or, more likely, the tools she can use most effectively). She notices an error. She has been using her Bamboo work space to manage her content and in may be that some of the tools she is using run in the work space. She has open in her Bamboo work space her Scholar's Notebook. She records the error, location of the error, encoding level, error type, and the proposed correction. Scholar's Notebook is backed by an annotation/assertion store. Let's imagine that the location can be harvested easily from the tools she is using.

In another context curators are working with texts to correct them. The backing annotation/assertion store has a recommendation engine that assigns value to the correction based on a calculation of importance of source in the corpus, type of correction, and recommender's reputation. Possible corrections are organized by value rank and gathered into bundles by source text. The curator loads Scholar's Notebook and selects curation mode. A queue of suggestions tuned to her expertise is presented. The curator picks off suggestions from the queue, makes the correction and grades the recommendation, which grade contributes to the recommender's reputation One could imagine levels of approval.

The correction tool is tuned to making corrections at the encoding level at which the curator is working.

Workflow integration in this pattern occurs in the heads of the participants.

Strengths

A wide range of tools already familiar to scholars (and citizen-scholars) remain at front-and-center in the work of obtaining, analyzing, examining, addressing, and emending elements of texts of interest, presenting no added 'barrier to entry' in any aspect of a curation workflow other than recording a curation event. This allows the fullest possible freedom in selection of tools best suited to particular scholarly and exploratory tasks, as well as user preferences, except insofar as participants in curation workflows need to adjust to new and 'out of flow' steps and tools in order to accomplish the pivotal task of curating.

This pattern has the further advantage to Project Bamboo of clearly limiting responsibility and investment to a value-adding set of functionality that is closely bound to modeling, storing, and harvesting records of curation events.

Weaknesses

The need to adapt to a new tool and/or process that takes a scholar, reader, or curator out of her familiar contexts is a disincentive to participation in collaborative curation. As Martin Mueller put it in his essay Collaboratively Curating Early Modern English Texts, "The easier it is to switch between exploration and curation the easier it will be to engage scholars in the work of collaborative curation."

Another weakness is that there are limits on how loosely integrated the tools and the Notebook can be; for example, they need to share at least an addressing scheme, since the Notebook must be able to understand the "location of the error" that the user reports.

Plumbing Pattern or annotation reference resolution

In this pattern, the key piece of glue that drives workflow is an annotation reference resolver. Given a text context, the resolver can gather annotations/assertions in the neighborhood of that context by calling a method on the backing annotation service's API.

Our work in this pattern is to provide the backing service and simple demonstration clients.

Integration in this pattern amounts to passing around annotation references among services and tools. This is a looser approach to integration. Functionality is distributed across tools and services. Some of the tools run in work spaces or use work spaces functionality to manage content.

Strengths

This pattern permits annotations managed by Bamboo services to be put to a nearly boundless range of uses. Some of these uses can be manifested in Bamboo services and research environments, but none need be.

Also, this pattern offers the greatest advantage to Project Bamboo with respect to limiting responsibility and investment to a value-adding set of integratable functionality that is (a) closely bound to modeling, storing, and harvesting records of curation events; yet, (b) not bound to any particular application or interface, with the exception of simple demonstration clients. It enables developers of a wide range of tools already familiar to scholars to add value to their own software by presenting Bamboo-enabled annotation management within their own interfaces and workflows.

Weaknesses

The utility of Project Bamboo's investment is dependent not only on adoption by interested scholars, citizen-scholars, and curators, but also by the developers of tools these individuals employ. While the risk implied by this dependency can be mitigated through adoption by Bamboo-built services and research environments, it is important to note that this form of hedging may begin to bleed into the Workbench or Scholar's Notebook integration patterns described above.

Plumbing Pattern variation: curation by linked reference

In this pattern the user is reading a text or viewing metadata for a text. He recognizes an error or variant spelling/reference to an entity. Rather than propose a correction, he inserts a link from the erroneous or variant text or metadata to an authoritative ontology or reference supporting the proposed correction. 

Subsequently a curator determines whether to make a correction or to formalize the link. The latter action, for example, could be taken in the instance that the variant spelling of a name matches what was printed but is not the generally recognized (normalized) way of spelling the name. A link could also be left without further correction if there remains uncertainty in the community as to whether the current transcription is in error. 

Links might also be used to collocate or address relationship issues useful for organizing and facilitating discovery/use, such as to relate a manifestation to an expression or an item (instance) to a manifestation. Though a bit beyond the scope of the reference documents, these are in actuality important facets of curation.

Strengths

Same as Plumbing Pattern or annotation reference resolution, above.

Weaknesses

Same as Plumbing Pattern or annotation reference resolution, above.

Implied capabilities

Capability

Description

Delivery in Bamboo

Annotation

A application to support scholarly annotation.  Minimally, there is a backend annotation store that holds annotations generated and consumed by a group of collaborating scholars.  We imagine clients of this backend annotation store that allow scholars (whether faculty or citizen-scholars) to add annotations of texts, and clients that allow other scholar to review annotations. Annotations may be applied to addressable elements of a textual object. Annotation store must support classing (rdf typing) of annotations, structured annotation bodies, annotation body by reference (URI), annotation properties such as creator, time/date created, etc.

The annotation store could be implemented or proxied as a BSP service; it could be hosted on a Bamboo work space; it will be called by clients running on a Bamboo work space. Bamboo must support multiple annotation store (most probably external to Bamboo) and must support retrieval/filtering of annotations by target identity, annotation class, creator identity, created date, ....

Approval/acceptance

This is a mechanism for marking a suggested correction as accepted. A client UI and a backing store are implied. Approval/acceptance has a context: what is accepted by a scholar for her working set of materials may not be accepted by the curators of an 'authoritative' repository (or vica versa).

Possibly an extension of the annotation capability.

Curation log

 

 

Reputation management

A reputation is calculated on accepted vs rejected suggested corrections. Reputation, like approval/acceptance, also has context, e.g., reputation for making corrections acceptable to a given repository or set of curators. Should attributes available in the Bamboo profile also be available to curators viewing suggested corrections?

Linked to Bamboo Person and Profile Services. Open question: are reputations managed at the ecosystem level or in a curation application that is limited to a particular domain?

Linked data

Reference to external authorities, ontologies, participation in virtual collections that span Bamboo boundaries, instantiation of relationships (e.g., between expression and manifestation). 

Bamboo document stores and CI connectors must accommodate linked data attributes. Work Space and Services should leverage values.

Document location

Where the working copies of documents live.

In the Phase I model, documents are stored in a Work Space object store. CI cache is like a HTTP cache: transparent to users and merely done in the service of efficiency with respect to communication and network traffic between the CI Hub and repsitories to which the CI Hub mediates access.

Document retrieval

How documents are retrieved from a source repository.

Via CI Hub to a WorkSpace object store

Document editing

This is an application. (Nothing in the work flows implies real time collaborative editing at a distance.)

This application may or may not run directly in the Bamboo ecosystem. However, it obtains the objects it works on through the ecosystem and stores the revised versions of the document back to the ecosystem.

Permissions/policy management

Who can view, edit, approve. etc. Likely, this involves groups of users and permission is associated with a group.

BSP IAM: Persons, Policies, Groups services. Object level permissions (policies) in Work Spaces may be managed locally. However, Work Spaces users and groups are synced to BSP Persons and Groups services. Permissions and policies applicable to a repository whose owners / managers / curators evaluate and may accept artifacts of Bamboo-enabled curation are managed locally to that repository; modes by which a repository may participate in a 'curation mining' process of this kind are TBD and are likely to depend on a repository's particular requirements.

Version/change management

 

This is implemented on the local Work Space object store.

Pattern search across corpus

The application of analytic tools to a corpus to identify instances of an error pattern.

External services proxied by the BSP (in Phase One). In Phase Two analytic tools may be hosted on the BSP.

  • No labels