Navigation:
Documentation
Archive



Page Tree:

Child pages
  • SchNar-0014 - Tools to Aid Search, Review and Citation of 19th Century Newspapers

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

Tools to Aid Search, Review and Citation of 19th Century Newspapers

Collection Date: January 9, 2009
Scholar #1 Info:

  • Name:
  • Email:
  • Title:
  • Institution/Organization:
  • Field of Study/Creative Endeavor:

Collector Info (can be the same as "Scholar" above):

  • Name: Clai Rice
  • Email:
  • Title:
  • Institution/Organization:

Notes on Methodology:

Collected via the Workshop III Needs Statement Activity

Scope

Keywords

1. Was this story collected for a particular Bamboo working group?  If so, please include, as keywords, the appropriate group(s).

2. Suggested keywords: Does this story contain elements that could be mapped to these keywords?  If so, please indicate which ones and briefly describe the mapping.  Add any additional keywords in #3. (These are global keywords from this page keywords)

3. Please list additional keywords here:

4. Related Stories: Are there parts of the story that relate to other collected stories? Please provide title(s) and link to the story page. 

Story

One project I am embarking on now is a "distant reading" project (Moretti,
Graphs, Maps and Trees, 1). I am interested in patterns of diffusion in
American newspaper poetry of the late nineteenth century. It was common for
newspapers to reprint poems (and other small items, such as jokes or
stories) from other newspapers. I have been wondering lately if there are
any geographical, chronological, or formal patterns to this dispersal. Do
poems appear first in larger papers and then disperse to smaller ones? Is
there an overall geographical pattern, like dispersal from east to west? Do
poems on certain topics, or in cast in certain forms, gain preference? To
study this, I simply locate a poem in a newspaper, then search for it in
other newspapers, noting the date and location of the papers (and examining
any significant textual alterations. Titles are commonly quite variable).
All of this information is going into a database, with the goal of creating
a geographical map-based display that will allow users to track individual
poems, groups of poems, authors, topics, and newspapers of origin (what
papers print frequently reprinted original poems?)

The current portion of the research requires access to full-text databases
of nineteenth-century newspapers. Proquest Historical Newspapers is the most
reliable, but contains only 11 papers, all major dailies. What makes this
project possible is the rapid development of full-text archives for
genealogy research. These archives are developed from microform, and the
full text OCR is very unreliable. Currently the two fullest archives are
NewspaperArchive.com and GenealogyBank.com, but there are numerous smaller
databases as well. So my process is to locate a poem (as soon as the
procedure is set I will work from one large daily, covering a month at a
time), select 2-3 word search phrases, then search NA and GB for them. On
the result lists I have to verify each hit visually because both databases
are notoriously incorrect on dating. I must search on multiple strings
because of the unreliable text. And I can't do a single search for both
databases-there is no search aggregator. Currently I do not keep a copy of
each hit PDF due to file sizes and some poems have 50 reprints spanning a
decade.

Current tools include the browser, newspaper databases, and a text editor.
Later I will be using a database, probably mysql, with a web interface. The
online newspaper databases all have authentication procedures that
frequently interrupt searching or make it more time-consuming. The ideal
tool would be a search aggregator for the different databases, one that
would return hits in a uniform format. Also helpful would be an onscreen OCR
that would allow rapid text searching of graphic PDFs. Even if it worked
only 50% of the time it would save a good deal of time overall. One way I
would do this would be to adapt something like the Zotero ability to make
entries from current page views. On one click it could grab and search the
PDF, then after visual verification was complete, another click would cause
it to store the PDF and create a bibliography entry. Then the data could be
dumped into another database as needed for analysis and display.

Other Comments:


Link

Notes