This wiki space contains archival documentation of Project Bamboo, April 2008 - March 2013.
How do we engage w/ existing projects
Make it so you don't have to choose between them
Should it also/instead be w/ individual scholars?
We want both.
Evaluating what's already done > peer review
Open access vs. closed in some way
Any sort of limiting is closed
In terms of who's involved, who gets to see what
Hope to reach a critical mass of international collaboration
Sakai model of sustainability - in order to be sustainable, need 800 institutions adopting it
John: "I just found out about these numbers; we have 100 and that means $million in revenue; someone must have decided on $8 million."
Sustainability concerns-- how do you organize this?
Part of community design and looking for other institutions is to build in commitment to sustainability from the start
If you're bringing matching resources from institution, as we move down the road
Encouraging to see how many groups want to be involved > tier model
Looking a lot @ open and commercial space; how grass-roots initiatives (e.g. Wikipedia) sustain themselves? What can we learn from them and adopt within Bamboo community
Sustainability spread across lots of individuals/institutions
Depends on who's editing vs. who's monitoring
What can we learn?
Social sustainability (table 2)
1) Long-term storage (50 years)
2) Separate abstract view from today's solutions
Not solution as solution to all the problems
Development and use of solutions today is a social thing
Brings researchers together to exchange ideas, social network for addressing research issues
Abuse current technology to foster networking and getting research together
If we put something into the computers, we should be able to get it out (technically AND legally)
Knowledge sealed away from the public is bad, because we work for the public
We're being paid for already, so our knowledge should be open to everyone
Observation that if you buy a new computer, then look at what you've got after a year, it's mainly not text
People accumulate images, audio, music, and use devices to stream AV material, watch TV, etc.
Current research predicts how research is changing as a result > real culture change in type of materials that humanities work with
Use of non-textual materials in very large quantities
IT required and being used in humanities research is more comparable to science, if not moreso
If this is true, will be multimedia explosion, then need to plan ahead for that
Questions re: preservations of data, tools for access, how to search
What if you want all scenes from a certain period of film history with a certain trait?
Thematic search over multimedia
English professor/folklorist at UofC took exception to library being place raw material is discovered, used, etc.
World is raw; world as data set
How do you capture/archive/reuse the world?
He was standing in front of hotel, followed sound of bagpipes to fire station, where firefighter was playing bagpipes and didn't know what song
How do we support/sustain that moment in time?
Some libraries do collect those soundscapes; have responsibility to make them available
How do we make it easier for this research to be exposed and shared
GISC funded digitization of all that sound content in the UK; encourages/assures libraries work w/ academic communities to do what researchers want
Concern about notion of community-- social networks, collaborations
Most people in the room are social anthropologists; suspicious
In our methodology, things depend on a context-- so community, open, collaboration, etc are contextual
How do you come to agreement? Too many contexts, or agreement would be an illusion.
Goes to your notion of value-- nature/value of evidence and selection process, may or may not be informed by formal discipline/community
You can capture/carry/organize information, but would you be able to replicate serendipity of the story of the bagpipes?
Scientists are happy to have individual articles; historians of science have to have runs of periodicles; doesn't matter if it's digitized, if it's just articles it cripples research
But you're dealing w/ a small group + copyright issues that can wipe out whole subject of study
Context of use is so different in each case-- main users are scientists, but there's this other group that could collapse
Affects how people need to communicate
Locally, we've talked about it and tried to figure out a way forward
People move on to digitized libraries generally, and this will be a more prominent problem
People are interested in different aspects of the articles
In onine journals, you might only have pdf's, may not have cross-search
As a linguist, w/ electronic language resources, want communication between different groups of scholars
If you make them available, why not make them in a format useful for other fields?
Electronic text makes possible things in linguistics; important for lexicographers
Open standards have benefits for everyone, not just target audience
You don't actually know what people will use something for
Might be helpful to look at low-level issues like format, tools for getting stuff out of material that's stored
Just as useful to look at low-level granularity than trying to think about scholarly practices, because these vary a lot
You want to find and do it - what you do is up to you
Needing different paradigms to engage with the amount of data that we have
100,000 articles -- what sorts of questions do you want to ask?
Switch scholarly paradigms, and we don't know what those paradigms will be yet
But we can make investments now to make this data as reusable as possible without having to anticipate how it looks
You haven't got all the people creating digital data that humanities researchers use here
This won't make publishers publish their electronic journals in particular formats
We have press engagement; UofC press, Penn State, California
You realize this two weeks into the project and you're out of space
Why weren't presses there? Why not museums? etc.
Everyone who read the proposal saw themselves in the proposal
Then had 3 other communities they wanted in this
"Best place to hide it is out in the open"
Image of warehouse at the end of Indiana Jones
As we have more and more stuff accumulated, we assume search tools will do it, but will growth in capability of search tools falls short/exceeds growth in data
As stuff accumulates, things hidden in perfect sight, and we'll never find them again
Google: based on number of links to that information
If looking for specialized resource, slim chances of 2 million links going through that
Search vs. discovery
Informally, people should tag
But tagging presupposes the search method
Different communities have different search strategies
On one end, you have philological concern (entering/tagging data)
Other end: how do you pull things out
Availability of data is less of a problem than handling it
Middle area: humanities catalog of what's there-- like WorldCat
Could work failry easily, lets people look for different data sources
Rennaissance center: turns out there were 4 other digitizations
Could be a way of thinking about this like WorldCat
Standards are community-based.
Depend entirely on context.
Reliable way of doing what's out there-- research topics/concerns
Pub conversation: usually find out at the pub, enhance the pub
Make the pub reliable
As research becomes more interdisciplinary, may have networks in your own discipline, but need new tools to find other disciplines
A network of networks.
Also observation that existing quality is variable; need more better-qality data
"More" in the sense that there's still lots of gaps
Catalog might help identify gaps
Problem isn't more data, problem is more usable data
If you want a full run of a journal, you have to physically go to the library > these are the only places you can do research
Provision of digital information is useful to humanities scholars > this can be controlled by people distant from humanities scholarship
Parts of the record could be destroyed or made worse for uses people really are doing
Huge changes that occur in libraries could leave out important parts of the record
Data quality is worrying - very contextualized (some is just rubbish)
Google doesn't have a single bibliographer, and it shows
Couldn't get the right combination of keywords to get the right volume of an agricultural journal on Google
20 copies that claimed to be first edition, were all second edition
Really basic level re: provision of good information in key areas
Here's where money creeps in: cataloging is expensive
Google vs. Lycos - Lycos had catalogers
Cheap, massive data didn't care about high-quality
We want high-quality data, but we have to be able to afford it
We run out of steam fast
But books are already cataloged in source
But Google provides a lot of stuff for free, it proved too expensive
How do we create semi-quality data after the digitization by adding a layer?
Should be looked at within information environment
Data protection / privacy issues
Reuse of public sector information, etc.
These need to be dealt with from the beginning, or the structure you build won't work
Best quality is often most protected
Academics have privileged access to protected info-- allowed in ways general public can't
Providers of that information won't just give it to us just because we have a right to it
Build an international collaboration of digital humanities researchers to stand up for our rights
International standard for which scholars can get materials
Greg Jackson- what would you think Congress would do?
Right now, would ignore it completely
No organized, effective opposition to industry position
Campaigns are won by good campaigns
Have to be persuasive, not just right
Assumption that right is enough
Have to spend money and advance an economic argument
Google works when you're not interested in a specific resource-- just information about X
Just some bits of information
Not the right tool for when you're after a specific item or comprehensive discovery
It's just find the right tool-- don't just assume you need a screwdriver
There's a range of open/closed positions that are perfectly reasonable
Pushing for everything to be open source could be counterproductive