Scheduled DB Maintenance: January 21st - 8:00 AM to 10:00 AM. Confluence will be unavailable during this time.

Navigation:
Documentation
Archive



Page Tree:

Child pages
  • SN-0033 ePhilology and Memographies

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

ePhilology and Memographies

Please fill in the following metadata about this story (and delete this line when finished!):

Collection Date:
Scholar #1 Info: (if more than one scholar's process is described, copy this set for each scholar)

  • Name: Gregory Crane
  • Email: moved to restricted page
  • Title: Professor of Classics; Winnick Family Chair of Technology and Entrepreneurship; Editor-in-Chief, Perseus Project
  • Institution/Organization: Tufts University
  • Field of Study/Creative Endeavor: Classics

Collector Info (can be the same as "Scholar" above):

Notes on Methodology:

This narrative is excerpted from a pre-publication draft of Conclusion: Cyberinfrastructure, the Scaife Digital Library and Classics in a Digital age, Christopher Blackwell (Furman University) and Gregory Crane (Tufts University). This paper has now been published in Digital Humanities Quarterly v3 n1 - Winter 2009. The draft was provided in response to an e-mail request from the collector that referenced Professor Crane's 4/6 presentation at the Project Bamboo Workshop 1d (Princeton) workshop and other published materials.


Scope

The scope section is provided by the collector, with input from the scholar(s), and attempts to estimate the scope of the group that performs the processes described: How broadly do the practices described in this story apply to others in same field, in related fields, etc?

  1. In the opinion of the scholar, who participates in the process the story describes? (e.g. "just this scholar", "many people in the scholar's field of inquiry", "all academics", etc.) "classical studies and the humanities in general" (from the source paper's introduction)
  2. What is this process intended to accomplish for the scholar? Address "the question of what research questions we can pursue that would not have been feasible without collections that are, if not exhaustive, at least large enough to be representative of the published record available in print." (from the source paper's introduction)
  3. Who is the intended audience of the processes described? Ultimately, the readers/consumers of scholarly communication/publication.
  4. Is this the only process the scholar uses to accomplish his/her goals? The processes in this story are more prospective than possible at present. One present area of Professor Crane's work is the Perseus Digital Library.
  5. What "shared services" would help transform the story into something of more benefit for the scholar or his/her audience?  What process or processes in the story could be automated? Cf. SN-0047 Services for eClassics

Keywords

Please provide some keywords that will allow us to group or cluster related stories--or aspects of stories.

1. Was this story collected for a particular Bamboo working group?  If so, please include, as keywords, the appropriate group(s).

  • Shared Services

2. Suggested keywords: Does this story contain elements that could be mapped to these keywords?  If so, please indicate which ones and briefly describe the mapping.  Add any additional keywords in #3. (These are global keywords from this page keywords)

3. Please list additional keywords here:

4. Related Stories: Are there parts of the story that relate to other collected stories? Please provide title(s) and link to the story page. 

SN-0047 Services for eClassics

Story

[...] ePhilology emphasizes the role of the linguistic record in producing and organizing ideas and information about the ancient world. [...] Memographies allow philologists to explore vast topics far too large for individual scholars in print culture.

[...] North American English language newspapers printed perhaps 50 billion words each year in the late 1860s. If we simply analyzed these newspapers, we could open up whole new lines of inquiry, tracking a range of topics:  Which newspapers reprinted stories from which? What sorts of things did people say in newspapers from different parts of the country with different party affiliations about slavery over the course of time?  What poetry and fiction appeared in these newspapers?  What products were advertised?  All of these are eminently tractable problems:  we don't need perfect transcriptions or perfect services to begin identifying the trends behind these topics.  If we begin to think about 19th century newspapers in other languages around the world, the challenges and opportunities become even greater.

Clearly we can begin to pursue topics that require analysis of much more data than any human being can see, much less contemplate.  We can begin to trace topics that have a life in human tradition that goes beyond any single period or immediate context. Such topics have lives of their own.  We can now write histories or (to pursue the metaphor of living things) biographies of these topics. The geneticist Richard Dawkins coined the term meme in 1976 to describe the cultural counterpart to biological genes: memes include any thoughts or behaviors that can be passed from one person to another and examples include "thoughts, ideas, theories, gestures, practices, fashions, habits, songs and dances." The term meme provides a useful concept because it stresses the autonomy of ideas as they circulate through our biological brains and storage technologies. The concept of a meme allows us to consider both information about a historical topic that existed in the material world (e.g., the life of the historical Alexander the Great) and topics that have a life of their own (e.g., Alexander as a hero of Iranian folk tales). We use the term memography to describe the history of a meme within a larger body of material. [...]

The biological Plato, likewise, vanished more than two thousand years ago but his writings have been copied ever since and the historical Plato continues to exist as the topic of discourse.  Scholars could, in print culture before the advent of searchable texts, laboriously track down many Platonic testimonia, e.g., the explicit quotations and most obvious allusions to particular passages in Plato. German classicists have begun to apply text mining algorithms to search for quotations and allusions that previous generations missed. If we wanted to understand the role of Plato and the ways in which others have quoted and used his dialogues, we would need to work in every language where Plato was influential.  This would include not only such common languages of classical philology as Latin, English, French, German and Italian, but virtually every European language that left behind a substantial body of written discourse. If we then consider that Plato has had a major presence within Islamic thought and realize that we will need to consider Arabic and Persian as well, it quickly becomes clear that no single scholar can create from the primary sources a global overview of Plato's influence from antiquity through the present. The nineteenth-century newspapers mentioned above present just another component from the sources that shed light on who said what about Plato.

In an age of very large collections, we can, however, begin to design systems that will provide automatic visualizations of topics such as Plato and Plato's works.

  • Named entity analysis finds passages that refer to Plato the philosopher, filtering out those passages that refer to other figures of the same name (e.g., the Athenian Comic poet named Plato).
  • Quotation identification finds direct quotations and paraphrases of passages in Plato.
  • Cross language information retrieval extends named entity and quotation identification to multiple languages (e.g., Arabic, Chinese, Latin, English, French, German, Italian, and Russian and other languages for which major cross-lingual resources are available).
  • Text mining identifies words and phrases that appear in conjunction with references to and quotations of Plato. These words and phrases allow us to discover common ideas associated with Plato across different genres and periods.
  • Machine translation links similar words and phrases associated with Plato in multiple languages, identifying cross-lingual cultural units.
  • Visualization systems allow readers to track, for example, where and how often Plato's Republic has been discussed, what passages have been most examined, and what sorts of things people have said about Plato, whether in Berlin or the Iranian university city of Qom.
  • Customization and personalization services then provide individual analysts with relevant materials in languages that they understand as well as machine translation and interactive translation support services to help them with languages in which they have little or no fluency. Thus, the system might present scholars of Islamic thought with translations of Plato and translation support geared to their particular knowledge of Greek.

Each of the above and similar processes is analogous to the sensors by which scientists track data in the material world. Each of the above processes will produce noise as well as a usable signal. The results will not, of course, be scholarship, but rather data within which patterns can emerge to stimulate scholarship - in the end, human beings will have to contemplate what the systems have found. They will refine the questions that they ask, contemplate the results again, and then repeat their analysis in an iterative process. But, despite all the noise within the system, we will quickly start to see patterns about who has said what at various times about which passages of Plato in a variety of languages. [...]

No one will ever be able to see, much less read and contemplate over time, the primary sources underlying broad topics such as the history of Latin over two thousand years or even the reception of Plato. Of course, this is hardly new: no living humanist publishing on major canonical authors such as Homer or Shakespeare can claim to have read and pondered more than a subset of conventional published scholarship in the conventional languages of European and American scholarship. But the rise of large collections and emergent systems with which to analyze those collections allows us to shift our stance away from the limits of what we can read with our two eyes and towards the challenges of working with machines that can scan large bodies of material and then (as we will see through the discussion of Plato's challenge below) allow us to focus in detail on passages in more languages and from more contexts than was possible before. [...]

A memography contains elements that are deeply traditional in form and general purpose, even if it represents an engagement between author, reader and source materials so quantitatively broader in scope as to constitute a radical change. [...]

Characteristics of a memography include:

  • Citation: A memography contains citations between statements and the evidence on which they are based. A memography differs from a traditional monograph because in a memography we know that authors have only been able to scrutinize a subset of the evidence cited. Citations in a memography include versioned queries: we can thus see what evidence was available at the time when the memography was completed and how that evidence has subsequently changed as new sources come on-line, existing analytical tools become more powerful or wholly new services emerge.
  • Scale: A project becomes a memography as its scope brings in more primary materials than a single human author can effectively analyze. Topics so vast that authors in print culture needed to focus their work on synthesizing specialized studies and could base their work primarily upon the primary sources would be subjects for memographies. The author must depend upon techniques such as sampling and automated analyses. A memography of George Washington would, for example, require, as one foundational dataset, the relative frequency of references to George Washington in multiple periods, genres, languages and cultural contexts. Such figures would require automated named entity analysis applied to very large collections. The memography would include a human author's assessment of the accuracy of the automatically generated data.
  • Heterogeneity: Memographies include not only more content than authors can review but content that assumes more categories of background knowledge than individual authors can expect to acquire. Such barriers can be language, cultural background, mathematics and any other topic. The history of mechanics could thus justify a memography because it requires not only a substantial understanding of mathematics and physics but sources produced over millennia and across Europe, North Africa and the Middle East in Greek, Latin, Arabic and every European language. Memographies thus require scalable, automated systems that can provide customized background information with which readers can examine and manually analyze any given object referenced. Thus, readers without training in Arabic but familiar with other languages and with the underlying scientific contexts can use automated morphological analyses, links to an on-line dictionary, and existing translations in languages that they do understand to pull apart Arabic source texts and determine which words are used in particular contexts to describe key concepts.

Whether we are producing or reading (or both), most memographies will force us to interrogate primary materials from more contexts, linguistic, cultural or both, than we can expect to have studied in detail - the most powerful memes will work their way across time, genre, language and culture and it is this very quality that leaves a trail too long and complex for any single human mind. We must look to machines which can find and preprocess material relevant to a given meme through immense bodies of data.

Other Comments:

Links to activities


Link

Notes

Example Link