Scheduled DB Maintenance: January 21st - 8:00 AM to 10:00 AM. Confluence will be unavailable during this time.

Navigation:
Documentation
Archive



Page Tree:

Child pages
  • W2 - Tools and Repository Partners

This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.

Skip to end of metadata
Go to start of metadata

Tools and Repository Partners

Table of Contents

Questions and concerns

Plenary notes

Group 4

  • Questions of identity - better managing identities can allow for a variety of research accesses to materials that might otherwise look like threats to those kinds of installations (multiple hits from counting tools on data sets)
    Might be related to, how might we negotiate with owners of collections of archives that are protective of their sources? How might we create a sense of trust for sharing materials?

Group 12

  • "Stuff" - should this be renamed "tools and content" - representative of the things we care about, not just technology
    Sub-question: commercial partners, license, IP-restricted content - relates to our need to get at stuff
  • Tool interoperability; want to specify not only tools and also environments - how well do they play with each other
  • Incentives for sharing



Group 4 notes

Notes:
How do we use these to advance research?
Adopt some of the methods from genetics/biology
Humanities may not be able to adopt this:  Humanities believe they can exist without technology, sciences are engaged with technology throughout the process of becoming part of the discipline.

Where is the value for the faculty?

How can we evolve with the system?
Scholars have not identified what they are using technology for.

2 questions:
How do we handle identities to allow for research types of access to repositories etc. (types of access that otherwise might look like malicious attacks)?
How might bamboo facilitate archives that are protective of their sources to feel secure enough to open them up to uses through bamboo?



Group 12 notes

•    same question: sustainability
•    what about a focus on stuff, elevating the stuff element so that it's not just about tools? or tools and archives?
•    OR, rather than building tools, create a list of the things that a tool must be able to do
•    if we're talking about content partners, are we also talking about commercial partners and how much can we build? will we have trouble with the licensed content and we'll only have access to the open source content?
•    need to make sure that participation doesn't depend on simply financial resources backing institutions?
•    commerical developers have more capital to invest in useful surfaces; in the future, content itself is going to be less important to these companies than these services they can provide
•    Bamboo: to what extent does it play with or against this trend?
•    OR, how can we leverage industry? can we work with, say, google? or work with coding or commerical industries working on 3-d models of the book? but, in the end, we need to give them deliverables
•    how can we build something like google scholar UP so that it's more comprehensive? how can we work WITH industry rather than against them?
•    need also to think about incentives for not just building tools, but also sharing them

TR: What about a focus on stuff? Elevate the stuff element. Tools and archives a "friendlier" title for this? Or tools & content-providers? "Repository" is a mechanical level, and what's of concern to scholars is the content itself.
TR: how do we search into materials that are in a "locked access" repository - commercial, copyright-protected stuff? How to annotate those materials?
TR: License holders have more capital to invest in services that add real value. Might content become less important than services that add value to content? And how should PB "battle" or otherwise engage with these efforts in order to both give to the commercial entities and get from them.
TR: Separate metadata from content.
TR: CDL repository ...
TR: How can we get tools to integrate better with the content?
TR: What are the common characteristics, aspects, or capabilities of tools that will facilitate enablement of practices-of-interest ... could PB describe such c, a, or c as a contribution to toolmakers?
TR: Citation/provenance to enable incentive



Risks, rewards and plan

Tools and Content

Plenary notes - risks and rewards (Tools and Content)

Convergence - similar to social networking

Priorities and scope

  • Thinking for furthering collaboration/conversation in DH
  • Building a discovery/use layer that's comprised of a tool registry and content/resource registry
    • Expose things for discovery, talk about what they're built on,
  • Integration strategies - identify what standards we need and advocate for them
  • Developing strategies that cut across various communities
  • Making it really easy for registries to be used, building up use cases and case studies and stories contributed to by all the people involved (humanities scholars, librarians, IT) - get use case from all their perspectives
  • Finding out about things that didn't work too

Risks

  • Build something that doesn't meet the needs of the end users (too complex, not what anyone's looking for, people say they don't need this) - have to do ethnographic research, how do people actually work? How much time can they invest in learning something w/o clear-cut path to success?
  • Scholarly adoption of known/emerging resources and tools?
  • How do we know people will find these productive? Don't want to see Bamboo as a waste of scholars' time
  • Incentivization - how do you do it for content providers/users, while avoiding duplicative efforts
  • Evolution as virtual community - lacking face-to-face
  • Possibility of Bamboo being like Bamboo in gardens - just doesn't go away
  • On a regional basis, every few years, a face-to-face interaction

Rewards

  • Broad access
  • Could be great efficiencies, great economies of scale
  • New research opportunities that are truly interdisciplinary
  • MySQL - community of users from broad array of disciplines; all working towards common goal
  • Great boon to actually be able to talk to people developing those kinds of tools
  • Tool developers want to work with humanities data
  • Marketplace of ideas that rewards risk-taking (as long as it's not too high)
  • People are not willing to take these risks early in their career; if there's obvious rewards, we can leapfrog research in the humanities



Plenary notes - plan (Tools and Content)

Milestones: not a lot of time in three months, so what's achievable?
Thought about creating a repository in order to capture what's already out there (what we scoped out in the morning - a discovery and use layer)
Different kinds of integration strategies
"Straw men" that people can comment and react to to get discussion going
Clear needs for demonstrators: demonstrator wiki that would allow us to collect the items for the registry (tools, also could be tagged by the domain/activity/themes/degree of interoperability)
Wiki would allow community to collect what's already out there
How can we cluster the themes that are already there, can integrate into use cases and list the kind of use cases people are doing with those tools?
Demonstrator that would create a template for use cases and case studies
Agreement in the group - very important to not only capture tools and available content, but also the workflow that's attached to those tools - what do people actually do with that? Can capture scholarly activity more precisely and define it by domain
How the tools connect to different scholarly activities
Demonstrators: template that will allow us to define the typical problem - not just a list of tools, but what are they trying to solve? Can they be reused, reconfigured in a specific scholarly process?
Time to get beyond the one-off project - how do you get this into the heads of stubborn, local programmers?
Will this whole deliberative process help us get beyond it, or just create more layers?
How would you divide humanists ("a very squishy term") - simple matrix
"Is your work the object of your attention predominantly pre-digital, or predominantly born digital?"
"Text based, visual, time-based?"
Each of six cubby holes has connections with other things
High-res image of Beowulf manuscript - might talk to people using high-resolution mammograms
Time commitments: who's actually going to do what we're going to do in 3 months?
How do we get the institutional buy-in?
Do I have to ask my boss if I want to be in a working group? No. Can she tell me not to be on it? No. It's different if you're in IT, but that's another problem.



Group notes (Tools and Content)

Questions that need to be addressed to shape this direction

  • "Stuff" - should this be renamed "tools and content" - representative of the things we care about, not just technology?
  • Sub-question: commercial partners, license, IP-restricted content - relates to our need to get at stuff
  • Tool interoperability; want to specify not only tools and also environments - how well do they play with each other
  • Incentives for sharing
  • Questions of identity - better managing identities can allow for a variety of research accesses to materials that might otherwise look like threats to those kinds of installations (multiple hits from counting tools on data sets)
  • Might be related to, how might we negotiate with owners of collections of archives that are protective of their sources? How might we create a sense of trust for sharing materials?



Flipcharts (Tools and Content)

View flipcharts

W2-Tools-1
Scope
1) Discovery & use layer

  • Tool registry
  • Content/resource registry
    2) Integration
  • Interoperable tools
  • Advocacy for standards
    3) Developing cross-cutting community
  • Simple/transparent roadmap
  • Use cases, case studies, stories
    (Know by example/models)
    (Template for NARRATIVES including viewpoints from scholar/users, developers, content providers, ASSESSMENT, EVALUATION)

W2-Tools-2
RISKS

  • Standardization doesn't meet needs (which are very nuanced and different among users)
  • Scholarly adoption of knowing & emerging resources & tools
  • Incentives for content providers & resource users
  • Avoid duplicative efforts, knowledge of existing initiatives
  • Loss of opportunity by evolving EXCLUSIVELY as a virtual environment

W2-Tools-3
REWARDS

  • Broad access/discovery broad & deep
  • Integration
  • Efficiency
  • Economy
  • Shared knowledge, marketplace of ideas
  • Interdisciplinarity
  • New research opportunities

W2-Tools-4
SHARING/DISCOVERY

  • Registry: Content - assets, data
  • Registry: Tools
  • Scholar networks: knowledge sharing via USER feedback about content & tools interaction
  • Registry: standards - environmental sow (question) and registry
  • Scholar networks: use cases/case studies
  • Discovery and use layers - Functionality: metasearch/query architecture, use/re-use environments, contextualize content who/what/where?

W2-Tools-5
Advocacy/Partnership
Cross-repository discovery & opportunity for data analysis

  • IP
  • Business case - how to incent content providors?
  • Technical challenges
  • Metadata mapping x-platform
    Scholar network
    Know community of users
  • Beta groups
  • User groups
    For purpose of use case, development of tool QC & testing, requirement gathering
    Seamlessness across repositories of primary source materials

W2-Tools-6
CONTENT DISCOVERY METHODS
Format, Domain

  • Broad-scale discovery (breadth)
  • Drill-down discovery (depth)
    Tools to support 'discovery' and use at multiple levels
  • Find - discover -> library perspective
  • Make order - use -> scholarly practice
    Need case studies/use cases to understand relationship between content repositories and scholarly uses (discovery w/ access)

W2-Tools-7
MILESTONES

  • Develop timetable and deliverable first for working groups
  • Develop a "straw man" for each of 3 scope items (e.g. what does a registry look like? What are the pieces/template for a case study?)
  • Use PB themes to cluster registry items (e.g. annotate resources/tools)
  • Define problem sets (what problems are we trying to solve for scholars? Born digital materials vs. analog originals)
  • Registry should include extg. & desired tools, repositories, use cases
    See diagram

W2-Tools-8
DEMONSTRATORS

  • Wiki for registry (domain tagged, activity-theme tagged, clusters that apply, degree of interoperability)
  • Use case/case study template
  • Generic workflow representation for scholarly activity which would be customizable by domain
  • Template for defining problem sets

W2-Tools-9
COMMUNITY - HUMANISTS
Clusters

  • Visual (static, dynamic - time-based)
  • Textual
  • Performance
  • Creators/analysts
  • Full spectrum of academic-engaged students -> senior academics
  • Spectrum of digitally "mature" users
    See diagram

W2-Tools-10
COMMITMENTS

  • Time (release time not available for current 3 mos)
  • Leadership/coordination for effort
  • Institutional buy-in for staff to participate in PB planning activities
  • $ for travel
  • Realistically - the only commitment that can be made for next 3 mos. is to add our comments/notes to PB wiki
  • Articulation of demonstrator projects



Tools and Repositories II

Plenary notes - risks and rewards (Tools and Repositories II)

Title

  • Had a discussion about the title; people liked "repositories", but others thought it constrained and for non-IT people, it might be exclusive
  • Renamed "Tools & Content"

Scope

  • Licensing and IPR discussion, then told it was beyond scope
  • Everything else we talked about was provisional on Licensing being made possible
  • Maybe Bamboo could establish different levels of access that might be given to certain people for certain kinds of content
  • Might design/promote tools to provide different levels of access

Risk

  • Bringing together content from different sources, PB could have a liability if someone reverse-engineered resrictions and did bad things to the content

High reward

  • scholars want to do this

Priorities

  • Priority thing to do was make it possible to bring together content from different sources and operate on it, assuming the IPR issues had been solved
  • Important to see it support scholar/researcher at small universitiesPossibility of public access
  • Identify rewards/incentives so content providers would trust tool developers
  • Tool interoperability (but standards group is doing this) 0 this is important
  • Don't want to reinvent wheels
  • PB might come up with "core services" that could be helpful to software engineers as they build API's
  • What was incentives for institutions - building tools PB wants, not just what you want
  • Making something Bamboo-compliant could mean sacrificing exactly what you want to do
  • Assured that needs of arts scholars were considered in workshop 1, but felt unaware of what came out of that
  • Felt like there's a lot of tools out there - make it possible for people to find out what tools are there, and give them access
  • Blend together use of different tools



Plenary notes - plan (Tools and Repositories II)

Registry for discovery of tools/services/content
Need to identify what core services are needed - access to data, common services we can use to access text/multimedia
Identify what demonstrator projects are
Identify rewards and incentives so content providers will trust us
Demonstrator projects:
-Service to get an image that facilitates sharing of content across multiple formats
-Same sort of thing from text; multiple texts from multiple archives and ask a question to analyze across all of them
-Represent discovery of content as well as tool discovery registry
-Entity extraction: extracting people, locations, other things people can specify ontologies for
-Mapping dates to MIT timeline, locations to Google Maps
-Pull annotations across repositories
-Zotero to annotate your collections, then send it to MONK (would have to be text related)
-Scholarly mashup environment - stitch together multiple tasks



Group notes (Tools and Repositories II)

Questions that need to be addressed to shape this direction

  • "Stuff" - should this be renamed "tools and content" - representative of the things we care about, not just technology?
  • Sub-question: commercial partners, license, IP-restricted content - relates to our need to get at stuff
  • Tool interoperability; want to specify not only tools and also environments - how well do they play with each other
  • Incentives for sharing
  • Questions of identity - better managing identities can allow for a variety of research accesses to materials that might otherwise look like threats to those kinds of installations (multiple hits from counting tools on data sets)
  • Might be related to, how might we negotiate with owners of collections of archives that are protective of their sources? How might we create a sense of trust for sharing materials?

Top priorities to address by W3 (Jan 2009)
Possible Demonstrator Projects

get faculty involved...

1. image tool

Share set of images (2000):
Art History faculty member co-teaching with U. of Chicago... NW...
with students

multiple sources
multiple formats
integrated display

2. multiple texts

critical texts... used in a class, some in 19C fiction, etc..
Jane Austen @ Oxford University

deep analytics: data mining, phrase patterns

G. Crane
Martin

3. Discovery Identification

Scholars' oaister

OAIster is a union catalog of digital resources.

4. Entity Extraction

dates --> simile

SIMILE is a joint project conducted by the MIT Libraries and MIT CSAIL.
Semantic Interoperability of Metadata and Information in unLike Environments
places -> google maps

PARTNERS

Nayos
R. Prellinger

Getty

ECAI

MITH

D. Rumsey

Seasr
Fedora/dSpace

DSpace captures your data in any format - in text, video, audio, and data. It distributes it over the web. It indexes your work, so users can search and retrieve your items. It preserves your digital work over the long term.

JSTOR/ARTSTOR

an Cohen

TAPOR

TAPoR is the Text Analysis Portal for Research, a collaboration by six Canadian universities to build a centralized gateway to representative texts and sophisticated text analysis tools.
Tahoo
Mozilla
Amazon (S3)
Google

VRE (dance)

J. Unsworth

5. ANNOTATION TOOLS

by media type
persistance

example: publishing a dissertation/thesis

6.Middleware App

Zotoro -> Monk
-> SEASR

7. Tool Discovery

Registry of tools

8. Scholarly BPL

Scholarly workflow.. levels? expertise level of scholar
click a button, sequence workflow, workbench level...

BPL = Business Process Language/Linkage?

visual environment where scholar sequences tools

workbench: create flow

Metadata for tools: what kind of parameters

Find 5 tools and put together... stitch together

Put together favorites tools:
Get text
Lexical tool
Visualization tool



Flipcharts (Tools and Repositories II)

View flipcharts

W2-Tools2-1
Tools & Repositories II

  • Staff
  • Partners, license, IP
  • Tool interoperability
  • Incentives for sharing
  • Question of identity trust

W2-Tools2-2

  • Content vs repository
  • Content is more inclusive term
  • Reflects scholarly thoughts
  • Collections? Data?

W2-Tools2-3
Access to "Content", "Data", "Repository"
1) Loss of control
2) Promote scholarship
3) Preservation
4) "Sensitive" content requires limited access
5) License and legal issues

W2-Tools2-4
6) Bamboo -> establish license protocol
7) Bamboo "communicate" repository information
8) License negotiation advocacy for scholarly use
9) Cooperative licensing

W2-Tools2-5
10) Interoperability issue for resources within our control
11) Develop a tool to establish scholar to pull content from multiple resources to view and use

W2-Tools2-6
12) Demonstrator projects to show possibilities of how to use content
13) Set guidelines and frameworks for software developers - have tools "talk" to one another' "Bamboo compliant"

W2-Tools-7
14) Bamboo can establish core tools
15) Rewards collaboration/scholarship re: budget
16) Performing "arts" - music? Where is the data?
17) Changing trends in scholarship?

W2-Tools2-8
Interoperable tools & content
To be done:

  • Identify core services
  • Identify demos
  • Identify rewards/incentives
  • Scholars at small schools
  • Tools available elsewhere; discover & use/blend

W2-Tools2-9
Potential Demos
1) Image tool

  • Multiple src
  • Multiple formats -> integrated display
    2) Multiple texts
    See diagram

W2-Tools2-10
3) Scholars oaister
Discovery
Identification
4) Entity extraction
dates -> simile
places -> Google maps

W2-Tools2-11
Partners

  • Jane Austin Proj
  • Naxos
  • P Prellinger
  • Getty
  • ECAI
  • MITH
  • D Rumsey
  • Manyesps
  • SEASR
  • Fedora/dSpace
  • JSTOR/ARTSTOR
  • Yahoo
  • Mozilla
  • Amazon (S3)
  • Google
  • Dan Cohen
  • TAPOR
  • VRE (dance)
  • J. Unsworth
  • G. Crane
  • MARTIN

W2-Tools2-12
5) Annotation

  • By media type
  • Persistence
    ex. - publishing a dissertation/thesis
    6) Middleware app
    Zotero -> square -> MONK, SEASR (see diagram)

W2-Tools2-13
7) Tool discovery & use
8) Scholarly BPL to stitch tools
Environment (visual) when scholar sequences tools (question)

W2-Tools2-14
Scholarly workflow
ex. get text from repo a (w/ parameter) then send to lexical tool then send results to visualization tool

W2-Tools2-15
Upward arrow, from Workbench (create flow), Execute a flow (interactive), Saved result set