Page Tree:

Child pages
  • SN-0014 Tools to Aid Search, Review and Citation of 19th Century Newspapers

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

Tools to Aid Search, Review and Citation of 19th Century Newspapers

Please fill in the following metadata about this narrative (and delete this line when finished!):

Collection Date: January 9, 2009
Scholar #1 Info: (if more than one scholar's process is described, copy this set for each scholar)

  • Name: Clai Rice
  • Email:
  • Title: Assistant Professor, English
  • Institution/Organization: University of Louisiana, Lafayette
  • Field of Study/Creative Endeavor:

Collector Info (can be the same as "Scholar" above):

  • Name: Same as above
  • Email:
  • Title:
  • Institution/Organization:

Notes on Methodology:

Please briefly describe the collection methods used (eg. "self report", "questionnaire", "ethnographic interview")


The scope section is provided by the collector, with input from the scholar(s), and attempts to estimate the scope of the group that performs the processes described: How broadly do the practices described in this narrative apply to others in same field, in related fields, etc?

  1. In the opinion of the scholar, who participates in the process the story describes?
    (e.g. "just this scholar", "many people in the scholar's field of inquiry", "all academics", etc.)
  2. What is this process intended to accomplish for the scholar?
  3. Who is the intended audience of the processes described?
  4. Is this the only process the scholar uses to accomplish his/her goals?
  5. What "shared services" would help transform the story into something of more benefit for the scholar or his/her audience?  What process or processes in the story could be automated?

One project I am embarking on now is a "distant reading" project (Moretti, Graphs, Maps and Trees, 1). I am interested in patterns of diffusion in American newspaper poetry of the late nineteenth century. It was common for newspapers to reprint poems (and other small items, such as jokes or stories) from other newspapers. I have been wondering lately if there are any geographical, chronological, or formal patterns to this dispersal. Do poems appear first in larger papers and then disperse to smaller ones? Is there an overall geographical pattern, like dispersal from east to west? Do poems on certain topics, or in cast in certain forms, gain preference? To study this, I simply locate a poem in a newspaper, then search for it in other newspapers, noting the date and location of the papers (and examining any significant textual alterations. Titles are commonly quite variable). All of this information is going into a database, with the goal of creating a geographical map-based display that will allow users to track individual poems, groups of poems, authors, topics, and newspapers of origin (what papers print frequently reprinted original poems?)

The current portion of the research requires access to full-text databases of nineteenth-century newspapers. Proquest Historical Newspapers is the most reliable, but contains only 11 papers, all major dailies. What makes this project possible is the rapid development of full-text archives for genealogy research. These archives are developed from microform, and the full text OCR is very unreliable. Currently the two fullest archives are and, but there are numerous smaller databases as well. So my process is to locate a poem (as soon as the procedure is set I will work from one large daily, covering a month at a time), select 2-3 word search phrases, then search NA and GB for them. On the result lists I have to verify each hit visually because both databases are notoriously incorrect on dating. I must search on multiple strings because of the unreliable text. And I can't do a single search for both databases-there is no search aggregator. Currently I do not keep a copy of each hit PDF due to file sizes and some poems have 50 reprints spanning a decade.

Current tools include the browser, newspaper databases, and a text editor. Later I will be using a database, probably mysql, with a web interface. The online newspaper databases all have authentication procedures that frequently interrupt searching or make it more time-consuming. The ideal tool would be a search aggregator for the different databases, one that would return hits in a uniform format. Also helpful would be an onscreen OCR that would allow rapid text searching of graphic PDFs. Even if it worked only 50% of the time it would save a good deal of time overall. One way I would do this would be to adapt something like the Zotero ability to make entries from current page views. On one click it could grab and search the PDF, then after visual verification was complete, another click would cause it to store the PDF and create a bibliography entry. Then the data could be dumped into another database as needed for analysis and display.

Sections below have not been completed


Please provide some keywords that will allow us to group or cluster related stories--or aspects of stories.

1. Was this story collected for a particular Bamboo working group?  If so, please include, as keywords, the appropriate group(s).

  • Education
  • Institutional Support
  • Scholarly Networking
  • Shared Services
  • Scholarly Narratives
  • Tools and Content Partners

2. Suggested keywords: Does this narrative contain elements that could be mapped to these keywords?  If so, please indicate which ones and briefly describe the mapping.  Add any additional keywords in #3. (These are global keywords from this page keywords)

3. Please list additional keywords here:


Please include the text, documents, media, or other material which comprise this narrative

Other Comments:



Example Link