Task List | Templates
June 1, 2015
5. hire for DH Mellon (with CTP)
November 19, 2013
Exploratory meeting with Babylotec project/team from Berlin (Vordersasiatisches Museum, Cornelia Wunsch and Tobias Schmidt)
November 5, 2013
October 9, 2013:
Laurie, Niek, Patrick, Asad Ahmed (UC Berkeley NES, Islamicist)
We invited Asad to share with us how he uses prosopographical data and SNA in his research, and to explore the possibility of his participating in BPS's development of support for integrating databases into the intake of data.
The main points:
October 2, 2013:
Laurie, Niek, Patrick
Agenda: consider various areas for development and support, to sustain continuous development and implementation of BPS
Time-line to be developed:
September 25, 2013:
June 4, 2013:
May 14, 2013:
May 7, 2013:
March 4, 2013:
February 12, 2013:
November 5, 2012:
October 29, 2012:
October 22, 2012:
October 15, 2012:
PLS: ECAI paper deadline?
Tying up loose ends:
Family trees: how to produce graphML. directed graph (fathers can have multiiple sons, sons not have multple fathers. how to represent mothers) nodes and arcs
LEP: graphML, work to create some
tele-conference: LEP to initiate: gotomeeting, DS suggests google hangout Europeans 5-6 or 8-9, suggest W/F starting Jan. 22.
check-list of outstanding threads: DS by end of next week
DS help checking in to service?
LEP: make sure Davide on staff list of wiki
PLS: affiliate DS to continue in VSPA
October 12, 2012: I-School presentation
We invite you to look at our presentation:
September 24, 2012
Timeline for I-School:
LEP contact DS for Oct. 1, 4 contributions
September 18, 2012
Planning for I-School and ECAI talks:
Secondary goal: Davide's presentation: making progress, how gather requirements, tools needed, how refining designs, what learned about perf, challenges of integration, what to continue to look for going forward. (wind up declare done or where are research questions that come out of/relate to this? resuability: how much? modeled correctly for other corpora? other kinds of questions? linked data? architecture challenges.
Consider other meetings/venues to present/publish
September 11, 2012
September 6, 2012
draft of abstract completed for October presentation and ECAI conference. posted to googledocs
Patrick to review and contribute for meeting Sept. 11
August 27, 2012
plan paper for October presentation to I-School and ECAI conference (Dec. 7-9 at Cal)
10 pages: (breakdown)
Davide and Laurie began abstract
August 14, 2012
Review Davide's charter and milestones
BPS Internship: August review of milestones
Original goals and the current state of achievement:
• Creation of a flexible and interactive visualization environment for historical social data
• Creation of a Social Network Analysis framework to provide researchers with new and relevant insights about the research domain.
1. Aug. 30:
2. Sept. 17:
3. Oct 12:
4. Oct 23:
5. Oct 30:
Remaining project tasks and associated action points:
August 8, 2012
Mid-internship demo sets out to:
spoke about where we are and what remains to be implemented. He laid out the course and progress of Davide's internship. He has approached the apprenticeship as an opportunity for Davide to develop project development and management skills that will serve his professional development. To meet these goals, the beginning of the internship included a great deal of background reading and much discussion of the architecture and conceptual foundations of BPS. Once the overall picture was understood, considerable attention was given to decoupling the components from the overall architecture. This was an outgrowth of and contributes to the project demands that the architecture be corpus agnostic and generalizable. Learning how to develop and deploy stacks on the local hardware required a fair amount of time early on. Early on, Davide and Laurie also met without Patrick, giving Davide the opportunity to interact directly with a humanities researcher. Patrick spent a great deal of time on these fundamental steps and thus paved the way for Davide to move into the phase of developing the components that will function in the SNA reasoner. This meant, however, that Patrick was not able to devote as much time as he had hoped to the processing of many of the tasks on the issues list, many of which were direct outgrowths of the March 2011 workshop. BPS was hit by a number of known spammers and much time was needed to address the security issues here. However, incremental progress in BPS functionality has been made and the newest additions appeared just before the workshop; Laurie will review the features in the next week or two.
gave a brief lead-up to demo. She explained that she hand-selected the texts with which Davide worked: documents from the Nana-iddin family archive (most of which has been published in Doty 1978). A subset of these texts was chosen because it contains known connections among the persons mentioned in the corpus, and presents evidence from a variety of text-types. These texts would provide a maximum payoff in demonstrating results at the same time that the data was easily accessible for Davide. At this point, BPS is not able to process the TEI, so Davide hand generated the (corpus agnostic) graphML that fuels the components of the SNA reasoner. She pointed out that while Davide is working locally, the demonstration is running as a web-service, that is the visualizations are generated by the software and are not mock-ups. The demonstration will show that we have the capability to handle the data correctly and to generate useful and meaningful outputs.
began his presentation with comments about which metrics he chose to present. In response to Niek's questions, he clarified that all of the metrics and the computations conform to standard SNA practices. He made clear that several of the metrics can be computed in a variety of ways but he opted to utilize only one method—although the architecture would allow for the implementation of other computational modules should a research so desire.
In presenting the graph, he showed the graphML he created from the text information. This is the only step in the demo where manual insertion of data was done. Once the graphML was produced, he ran the libraries of computation modules and Gefi UI rendered the graphs. The central graph illustrated the centrality of Nana-iddin (and the relative sizes of the nodes indicated the degree of each node.) The ability to choose different layouts of the same graph and the ability to focus on document views were shown. Niek raised the question of how the central graph (and particularly the document view) improved the researcher's tool kit. During this discussion, Davide dove into the code and implemented the ability to generate searches for requests on "Neighborhood", "Slave sale", "Ownership", among others. Choosing any of these searches returned graphs of the relationships over the entire demo corpus. Of particular interest was the graph returned on Slave Sales, which produced four distinct graphs (see the center graph in the second row in the photo below). Not only did this demonstrate the functionality of the SNA parser, but also was a testament to Davide's comprehension of how to implement the research needs of humanities scholars.
Davide pointed out that further development will take place in an interative mode, testing and developing and retesting. This will mean that not all functionalities will be completed but that the progress will continue to insure that project architecture, development and functionality continue to remain integrated and mutually supported.
Next week: realign DAvide's charter and milestones
September: another demo with increased functionalities
Images from the demo:
Davide demonstrates how requests for different relationships produce different graphs. He implemented these various search capabilities on the fly, in the course of discussion about the utility of the overall graph and the need for BPS to provide "value added" to the humanities researcher. This is a clear indication of what the SNA graphs can easily demonstrate and how it will aid the humanities researcher.
Laurie explores the graphs.
August 6, 2012
*May 21, 2012|*
*Davide: settling in: reimbursements
Davide about ready to run stacks locally
PLS finishing up clean up. Better handling for names: TEI modelling for names, with name glossary at beginning. Was only using normalized names, so now changing processing. Concern if TEI from other projects is in different formats.
Flagging errors: willing to absorb certain PN errors.
Logging primitive: would like more standard log output, wants parse log specifically (for rebuilt TEI). Would like to see publication data incorporated into output.
Small things to work on:
--- marriages: persons not all being handled. 2 nested persons. pre-process
---need marking up of scribes: pre-processing
---right edge: PNs are not witnesses
LEP: how to treat ina aszabi
LEP ask STinney about roles in ATF if outputted to TEI
LEP: basic rules wives, false witnesses, scribes, king's names, neighbors + P nos. where they occur: child page added following meeting with this information
need to build simple graphml for DS to work on
PLS will work on specific use cases with DS.
DS and LEP to collect views. Send DS Caroline's workshop paper. To meet for initial round of discussions this week.
*May 14, 2012*
Meeting May 14:
*Davide access to corpus TEI
*What to assign student worker (C Bravo): clean data for Patrick, making notes of odd syntactic problems
marriage needs to be documented as "gamonym"
roles such as relationships "wife", "his brother"
PLS and DS will process TEI in advance to deal with certain patterns: nested patterns; Not a temporary solution, because different projects will introduce own idiosyncrasies that have to be dealt with. Corpus specific processing in the initial layer of the archittecture. Will do this in XSLT.
PLS will add DS to wikii
List of fiddly things to fix: PLS and LEP to go through notes and compile into wiki or project-tracking software
consider adding feedback mechanism per document within workspace of BPS
problem of problems list being on wiki run by berkeley runs into problem with other projects' access
LEP: harder measures of what questions are asked, what are must/should/nice haves, how does vizualization have to work (how do family trees present and what do they show)
Process for raising and resolving issues in terms of assessment of tools. Set up documentation for tracking this process. Combination of high-level loose specs and then user stories.DS will look into the possibility of an instance tracking program. Look at google.docs. DS likes pivotal tracker---good for back-tracing etc. Looking eventually to move to githup.
LEP: pull out software from NEH seminar in LA
*May 4, 2012*
Davide Semenzin arrived week of May 4. Meetings with Patrick: review development environments, background materials for those environments, introductions to IST staff and to Warren Hall, set up work space. Meeting with Laurie: review text processing procedures, against error logs produced by Patrick.
*February 14, 2012*
Third meeting of Patrick, Laurie and Davide Semenzin, an MA student from Utrecht University (NL), who will spend six months (May - October 2012) at Berkeley working in the development of the SNA reasoner and graph viz of BPS for his master's project. Davide is a student of Professor Sjaak Brinkkemper and Marco Spruit. Following a series of emails and a Skype conference between Davide and Patrick in Fall 2012, the BPS team agreed that the prolonged period of work that Davide is able to devote to development for BPS and his interest in the SNA and viz components mesh well our current development needs.
Davide met with Patrick and Laurie in order to familiarize himself with the humanities component as well as the conceptual framework of the BPS project and architecture. In three meetings, Davide drafted the initial Project Charter statement. On his return to the NL, he will meet with his advisers and refine the project statement in conjunction with their guidance and in consultation with Patrick and Laurie.
First milestone will the submission of Project Plan/Charter on April 1.
*August 30, 2011*
Laurie and Patrick review prioritized list from Spring workshop. There are a number of low-cost-in-time/resources items that Patrick will be able to develop. Laurie will provide the information and data necessary.
Before addressing the priorities list: Patrick mentioned that HTML5 is getting a lot of discussion and its graph capabilities may be useful for us. He will continue to investigate. Agreed that viz is important to develop soon.
1. Roles: Laurie will provide templates for the various types of texts with indications of characteristic language that enables identification of individual roles and the nature of the activity documented in the text.
2. Broken/Damaged Names: require template for name morphology. Possible NLP area
3. robust IDs: referring to the IDs of PNs in the TEI. Anytime the TEI in an Oracc project changes, the signature changes. For the most part, this should not be a problem as the persons and line order don't change that much. However, for projects not using TEI generated through Oracc, and in certain situations in Oracc TEI, the name instances in a given line may change. PLS is thinking about adopting robust linking, using structures and patterns. This would accommodate situations where, if the number of name citations were to change in a particular line of a document, the robust linking would recognize that there are no longer 3 instances of that name citation and would flag the entry.
4. Place names: one of the priorities for workshop participants was the ability to identify places other than the transaction-location. The easy case is when another place name is mentioned in the text: the ATF lemmatizes for that and can be identified in the TEI. The difficult case is where places are alluded to by the presence of people known to have come from some place other than where the transaction occurred. As this is a high-cost project, it is necessary to determine whether this will produce something other than information that is nice to know.
Graphviz: This is an area for which we are going to look for a development partner. Laurie will contact Tim Tangherlini (UCLA) for leads to presenters at the Aug. 2010 NEH digital workshop (in LA) who might be interested in developing Graphviz in conjunction with our probablistic/assertion-based architecture.
Began identifying projects that would be appropriate for student researchers:
*NLP to identify roles. Included here is the problem of accommodating database input from researchers and how to convert db into TEI, as an initial (or separate) phase, and then run the TEI into the BPS tools. This recognizes that db would produce sparse TEI. Problems with db conversion into TEI would be identification/correction of errors in pattern matching for roles.
*NLP to identify broken/missing names: requires morphology of names, name authority index and output of existing names
*May 3, 2011*
Outcomes from tally of priorities from workshop of April 8-9: (pdf of worksheet)
Priority will be given to those items that contribute to establishing basic functionality of all components of the system. Low-hanging fruit (those items that are low-cost in terms of development time) will be done, regardless of how central they are to basic tools functioning (these are items that contribute to usability and interface that may or may not be critical to tool operation).
The model for the tools starts at the corpus level. The data from that corpus will be directed in two ways: one to the problem of disambiguation (on the machine level) and the other to the matter of assertions (and probabilistic modeling), central to the humanists' interaction with the data. Achieving disambiguation and some functionality with assertions will lead to graph and SNA output. Preference will be given to those tasks that most directly lead to the implementation of a functional, if basic, set of tools. The features identified by workshop participants as having highest priority/interest will be added on top of basic functionality and UI development.
*April 15, 2011*
Task List and Wish List resulting from workshop of April 8-9:
Item related to UI:
links desirable for each document
Item related to Reasoner process:
Names and Names Authority:
*Short names (nicknames) and alternate ("other") names:
1. Patterns of orthographic variants of a name
2. Common nicknames
3. Individuals known by two names
*Damaged and missing names:
Yoram wants help with working with these. Patrick said help with missing name problem depends on presence of some qualifications on the citation.
Damaged names: Patrick introduced workshop participants to Levenshtein distances. By computing these, may be able to consider possible matches. Need to understand the units of Levenshtein (probably not letters in romanized transliteration--PLS) and conventions of marking damaged morphemes in Oracc.
*Name Authority support:
In forename refs., value of n attribute is the normalized name, whereas nymRef points to name orthography. Desirable to pull variants from nymList and put into document listing. Move normalized name to the first column and put "name orthography" in second column
*Growing consensus names are global.
1. Allow years to be a timespan, and then make the nrads derive from those. Note that the computation of earliest and latest must derive from the basis-timespan. This lets the father range slide around more for a given citation of a son, and would let all slide around more for a document with uncertain date.
*Corpus management in Workspace:
1. Must be able to refresh the corpus, essentially re-importing the one we started from
2. Be able to just dump the corpus, to get back to the initial state. This is setCorpus(null), and make sure it works.
3. Looks like there may be concurrency problems. Need to figure out how to synchronize the service context caches, so we do not step on one another. Main thing should be the maps. Need to think about the others. Consider allowing single session on each workspace, and on each corpus - i.e., single write-access session.
Laurie: review treatment of damaged morphemes and rendering of broken names in Oracc---requires review of current standards and consultation with Oracc Steering Committee (Veldhuis, Tinney, Robson)
*May 13, 2010*
TEI of mini-corpus generated for tools testing. Mini-corpus of 20 texts selected for known family ties and generational continuity. All texts belong to family group published by L.T. Doty in Journal of Cuneiform Studies, "The Nana-iddin family".
*February 4, 2010*
January 20, 2010
December 8, 2009
incorporate CDL into BPS. most of NLP for HBTIN being done by lemmer. some of nlp and parameter material is corpus specific.
TEI review for names, or for roles, either before processing or as process files. flexibility of processing---discard multiple clans, error reports.
lemmatizer interface not emacs: ?
LP: think about implication of multiple corpora, rules for lemmer for each, how we work with corpora. perhaps watch workspaces in close environments to particular corpoa. separate feeds to keep contexts.
workspace: model raw corpus, lemmed corpus, how build lemmed. have to model the whole process, incorportate them. additions to tweaking
lemming process: how to lemm text, what's happening at deeper level. what's priority for deving? web interface.
November 30, 2009:
November 24, 2009:
November 17, 2009:
PLS worked on user management on the Berkeley Prosopography Services website:
November 10, 2009:
Based on the witness template and text structure (clearly defined sections of main text and witness list):
1. rather than default role being unknown, probably more efficient to set default as principal or witness.
2. as roles are refined (e.g., seller or buyer), can make more specific
cuneiformists would normally understand “principal” to be the buyer or seller. But there are other roles that will get labeled as “principal” by default that most cuneiformists would not consider to be “principals”. These include guarantors, clearers of claims, neighbors of property being leased or sold. However, this should not present a problem as long as corpus curators and users understand that this is a default distinction, that more specfic roles will be added to the parses and/or that roles that require too much fiddling to label automatically can be more spefically designated by hand.
Task: develop simple taxonomy of roles
seller: (probably not useful to submark)
To designate the scribe: encapsulate UMBISAG in entire person marker. This will account for any variation in placement of optional professional designation that appear in fully designated name.
There are no multiple scribes, so are dealing with a single scribe design.
build table for chronology
month names and lengths
Parker & Dubberstein: OCR?
king’s names/date ranges
Arsacid period dates
Parthian period dates
November 3, 2009:
Review PLS and LEP wiki-pages added for project development.
PLS most pressing need: latest XML from Steve Tinney. (still dealing with problem of text presenting multiple ancestors for single individual)
Laurie to consider:
1. how text corpus clean-up impacts fundamental assumptions: pay attention to those features
2. the development of family tree user stories for the graph builder, keeping in mind:a. steps necessary for the most primitive construction to what the final product might look likeb. how much weight to place on user needs for visualization vs. interactivity
3. templates for markers for roles
1. LP to contact Tinney (UPenn) for: (done 11-9-2009)
4. PLS: identify dependencies for user/service/ui stories. develop better task list.
Next meeting: November 10.