Page Tree:

Child pages
  • Texts and Context Demonstrator

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata


Enhancing Scanned Texts with Context


A demonstration of how scanned texts can be enhanced with links to contextually relevant resources. Using the output of an optical character recognition (OCR) process, line and word locations can be determined, allowing interactive selection and highlighting of references to people and places. These references can be detected automatically or manually added as annotations. Part of the Contexts and Relationships: Ireland and Irish Studies project.


This demonstrator shows how NLP techniques, in conjunction with search technology can help scholars in identifying contextual content for texts being studied automatically. The process starts with scanning of the texts (in the example used in the demonstrator the scanning was done by our collaborators at Queen's University Belfast, and is being included in JSTOR). We transformed the TIFF[1] page images and XML OCR output we received
from Belfast into PNG[2] images and JSON[3] for fast, lightweight viewing in a web browser. The Scrolling reading interface is built with Javascript, again utilizing the Yahoo User Interface libraries[5]. Because the OCR output contains information about the bounding boxes of the recognized words, we can simulate highlighting and selection of words on the scanned image. The named entity detection is done using OpenCalais[4].



Video demonstration

This page contains a link to a video demonstration showing the features of the demonstrator (click on the "see the demo video" at the bottom of the page)

Live demonstrator

The live demonstator is also available for exploration.

Additional Description

This demonstrator was created by Ryan Shaw for the work of two projects funded by the Institute for Museum and Library Services (IMLS): Contexts and Relationships: Ireland and Irish Studies and Bringing Lives to Light: Biography in Context.

  • No labels