Confluence has been transitioned back to Berkeley. If you experience any issues, please contact confluence@berkeley.edu.
Skip to end of metadata
Go to start of metadata

We have submitted a workshop paper for the ACM DocEng 2013 workshop on digital humanities: DH-CASE.

Update: paper as submitted is attached as a PDF here: DH Case 2013 Submission

A 400 word abstract needs to be submitted by June 15, 2013.

Abstract as submitted:

In this paper, we describe Berkeley Prosopography Services (BPS), a new set of tools for prosopography - the identification of individuals and study of their interactions - in support of  humanities research. The BPS tools include 1) functionality to import TEI documents and convert to our data model, 2) a disambiguation engine to associate names to persons based upon
configurable heuristic rules, 3) an assertion model that supports flexible researcher curation and tracks provenance, 4) social network analysis and 5) graph visualization tools to analyze and understand social relations, and 6) a workspace model supporting exploratory research and collaboration. We contrast the BPS model that uses configurable heuristic rules to other approaches for automated text analysis, and explain how our model facilitates interpretation by humanist researchers. We describe the significance of our curation model that improves upon traditional curation and annotation as a fact-based model by adding a more flexible model in which researchers assert conclusions or possibilities, allowing them to override automated inference, to explore ideas in what-if scenarios, and to formally publish and subscribe-to asserted annotations among colleagues, and/or with students. We detail the architecture and our implementation of the tools as a set of reusable web services and web application UI. We present an initial evaluation of researchers’ experience using the tools to study corpora of cuneiform tablets, and describe plans to expand the application of the tools to a broader range of corpora


The deadline for the full paper is June 22, 2013

 Topics in the Call that seem to relate to or resonate with our work:

  • Multi-level annotations in textual corpora 
    • Kind of - we annotate people, activities, dates, roles, etc. but this is only sort of multi-level

Q: does multi-level apply to multiple annotations per person, once the name instances are disambiguated?  The corpora we have been working with are, to date, treated as discrete units and within them, we have made the assumption that each NRAD has only one role in an activity. Even continuing to accept/work with that, once there are collaborations between scholars who explore the interconnections of individuals across space (in roughly the same time), we may encounter the need to mark multiple roles for an individual in the building up of his social network profile.  At this point, are we entering the realm of "multi-level"?

  • Collaborative platforms for digital text annotation and existent solutions  
    • Cf: Our publish/subscribe model for assertions

This topic and that of Annotation and ownership probably are mutually informative.

  • The metadata dialogue: crosswalk in annotating digital textual resources
    • This goes to our total workflow, I think (warts and all), but to the issues of TEI for prosopography

Agreed. Keeping in mind this is corpus (and consortium) specific, but could easily see this having applicability to other umbrella prosopographical projects: The CDLI  catalogue as source of text/object metadata for Oracc-aligned projects–what benefits or limitations has this presented? Does this have bearing on this topic, or is that more of an issue for consideration in building collaborative projects?

  • Annotation and markup in the humanities: techniques and technologies
    • Workflow and tools are what we're about
  • Linked data and Cultural Heritage: possibilities and perspectives in the interchange between digital/textual annotated objects
    • Just getting into this, really

cf. LAWDI (#lawdi)

  • What is a text? The differing interpretations of what constitutes a text within different DH communities
    • Interesting, especially as we get to minimal and damaged examples
  • OAC. The Open Annotation Collaboration. Utility and case studies in the DH domain 
    • Not really us, at least not yet
  • Archives, Libraries and Museums. The DH role and approach to cultural heritage 
    • Laurie?
  • Annotation and ownership: Annotation in a cross-community context
    • This is a big issue for us, and the provenance of assertions, and evolution of ideas
Areas we will include/discuss

Much of this draws on what we worked out on the whiteboard, but I have added some stuff on the areas I know something about.

  • Project profile, drawing from other work. Include something about our collaboration as a model, etc.
    • Cuneiform studies and digitization
    • DH, combining humanist and IT cultures to really collaborate
    • Goals of project (we should aim high here, and perhaps look forward more than back).
  • Project context - peers, relationship to field
    • Tradition of prosopography
    • Traditions of NLP - minor point? May not be a good idea to say much about this, other than to note that we are not innovating on NLP.
    • Projects like prosop.org
    • SNA toolkits - note and cite our libs, but describe how we wrapped in GraphML, and built as RESTful service.
    • OAC as representation with possibility of provenance. Does not solve problem of dynamism in the text. Cite robust linking by Wilensky, et al.?
    • Scholarly workspaces - Perseus, etc. 
  • Workflows around annotation
    • Lemmatization with Oracc tools, 
    • NLP markup for roles, activities, etc. 
    • Assertions on dates, on people, etc. 
    • Workspaces, settings and rules for model, what-if scenarios
  • Annotation standards we use:
    • ATF, Oracc, purpose, history, utility, issues
    • TEI, how we determined our markup, extent to which it is a standard, issues:
      • human markup vs. machine (automated) markup - who is creating it, how, and why, and how this impacts the utility of the markup
      • human interpretation vs. machine interpretation, nailing down semantics, understanding specifics
  • Our assertion model, and proposed functionality
    • Basic functionality and purpose
      • Accept/reject automated conjectures
      • Add additional information
      • Formalize model for disagreement and discussion
    • Publish/subscribe support. 
      • Use-cases for research
      • Use-cases for teaching
      • Provenance of ideas, tracking influence, metering
  • ???(May not fit in this one) Pluggable, abstract rules for disambiguation
    • Expose UI for user to control, expose parameters
    • Conform to simple model for disambiguation
      • shift, boost, or discount
      • intra-doc or inter-doc (cross-corpus)
    • Core generic types (parameterized, but general across many corpora)
      • Name qualification
      • Date comparisons
      • Role matrix
    • Base rule classes can be extended to other specific features of roles, activities, etc. 
      • Place of activity
      • Life-roles
      • Custom knowledge, e.g., a given family/clan might have a focus, or a taboo on certain activities
  • No labels