Page Tree:

Child pages
  • SN-0062 Comparing Versions of a Text in a Digital Corpus

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

Comparing Versions of a Text in a Digital Corpus

Collection Date: March 9., 2009
Scholar #1 Info: (if more than one scholar's process is described, copy this set for each scholar)

  • Name: Eric Greene
  • Email:
  • Title: Doctoral Student, Buddhist Studies
  • Institution/Organization: University of California Berkeley
  • Field of Study/Creative Endeavor: Chinese Buddhism

Collector Info (can be the same as "Scholar" above):

  • Name: Rich Meyer
  • Email:
  • Title: Project Bamboo Program Manager
  • Institution/Organization: University of California, Berkeley
  • Name: Connor Riley
  • Email:
  • Title: Graduate Student Researcher, School of Information
  • Institution/Organization: University of California, Berkeley

Notes on Methodology:

The collectors recorded this interview; delineated various workflows discussed in the interview and wrote them using quotes from the interview. These were then reviewed and edited by the interviewee before being posted.


The scope section is provided by the collector, with input from the scholar(s), and attempts to estimate the scope of the group that performs the processes described: How broadly do the practices described in this narrative apply to others in same field, in related fields, etc?

  1. In the opinion of the scholar, who participates in the process the story describes?
    (e.g. "just this scholar", "many people in the scholar's field of inquiry", "all academics", etc.)
  2. What is this process intended to accomplish for the scholar?
  3. Who is the intended audience of the processes described?
  4. Is this the only process the scholar uses to accomplish his/her goals?
  5. What "shared services" would help transform the story into something of more benefit for the scholar or his/her audience?  What process or processes in the story could be automated?


Please provide some keywords that will allow us to group or cluster related stories--or aspects of stories.

1. Was this story collected for a particular Bamboo working group?  If so, please include, as keywords, the appropriate group(s).

  • Scholarly Narratives

2. Suggested keywords: Does this narrative contain elements that could be mapped to these keywords?  If so, please indicate which ones and briefly describe the mapping.  Add any additional keywords in #3. (These are global keywords from this page keywords)

3. Please list additional keywords here:


When conducting research in early Chinese Buddhism, most scholars make use of one particular digital corpora (collected by CBETA) that collects and makes searchable one standard version of the Buddhist canon (the Taisho edition). However, there are many other versions of the canon which have been produced over the centuries.  These texts were originally made from carved woodblocks, some of which have been scanned (as photographs).
I primarily use the CBETA project's search tools as they have digitized the Taisho canon, which is the best edition since it is a critical edition that notes variant readings among approximately 4 different versions of the original texts. However the CBETA search engine only searches the main text found in the Taisho canon, and does not search the variants, which are however noted in footnotes. These variants are easy to access when simply reading the digital CBETA version, but they cannot be searched.

Further, as the Buddhist texts collected in CBETA and other collections were first transcribed from woodblock prints and have since been revised and reprinted, there have been cases where characters have been changed in a way which changes the meaning of the text. Errors can arise when characters are modernized from their older versions as well. When reading through digitized texts, I may find sections which seem to be confusing or ambiguously worded. If I want to make any scholarly assumptions based on the content of the text, I need to go back to the woodblock source and confirm that the text is correct.
It can be very difficult to get access to the primary source material, and even when the woodblock source has been located, and there is no simple way of cross-referencing between the woodblock print and the printed text. Ideally, I would at least verify readings against the primary source (i.e. the woodblocks) in every case, but in the current state I only have access to certain resources and can only spend the time verifying the printed text in cases of confusion.  Some original woodblock prints have been digitized, but the scans are not linked to the search tools.  It would be very helpful to have the search result return with a copy of the scanned text. The eventual scanning of all known versions of the canon is, of course, a desiratum. Finally, there are also manuscript versions of many of the texts from the canon. Many of the most important collections are in the process of digitization (International Dunhuang Project). Ideally, search engines such as CBETA would also be able to easily cross reference with known manuscript copies of a given text as well.


I would like to see the problem of varying text versions handled in a better way so that I could see if versions of a text differed around a term I searched for and more easily compare versions. Tracking terms across versions of a text would be a useful tool to have at my disposal.

Additionally, I would like to have primary sources integrated as part of the search process. Being able to read scans of the woodblock or manuscripts alongside the printed text, or quickly compare a printed passage against the woodblock or manuscript source simply by clicking a location on the digitized printed text would be very helpful. In a particular specialized case, I may be reading a manuscript version of a text in which I identify a term of interest. In the event that the term does not actually appear in the woodblock version of the text, I would still want to be able to link to the section within the woodblock where the term should exist, but does not.

Other Comments:

The information below was comprised when transcribing the interview, to make sure pieces were not missing.  If it is unhelpful, please disregard.


Discover item of interest (name, unusual character or word)
Select appropriate search engine (CBETA, SAT, etc.)
Search for desired term
Parse search results
Read returned texts
Read existing notes on differences between versions
Identify sections of the text which may require referencing original woodblock
Locate source for original woodblock
Find reference within the woodblock text

Ingredients: Tools and Content

CBETA or other digital corpora search tool
Chinese language Input Method Editor for Windows
Siku Quanshu or other Chinese literature collections
Digital scans of woodblock prints



Example Link