UC Berkeley CTO Bill Allison will lead a discussion about opportunities and peril in collecting and analyzing the digital streams generated by teaching, learning, and research activities on our campus.
Companies like Facebook and Google make money by storing, analyzing, and monetizing massive troves of data in a bunch of different ways. For them, as John Lanchester wrote in the August London Review of Books, "justifications about ‘connection’ and ‘community’ are ex post facto rationalisations. The drive is simpler and more basic. That’s why the impulse to growth has been so fundamental to the company, which is in many respects more like a virus than it is like a business. Grow and multiply and monetise. Why? There is no why. Because." At the University, more courses than ever before use online media, and tools like JupyterHub. Systems that students and faculty traverse as they think through the planning of their academic careers make it easier and cheaper for Berkeley to track, surveil, log, and subsequently analyze the myriad transactions, and correlate "digital exhaust" with more traditional records. There is the promise of better learning outcomes – or of pursuing deeper research than was previously possible. There is also the perpetual question of securing the data and preventing its use for purposes we never intended.
My hope is that we can have a vibrant, thoughtful discussion about the choices we are making and will make as an institution about when and how we capture data and metadata, and how we think about the values, benefits, and costs. We need also to think about tradeoffs between our students' and faculty's ability to see and/or opt-out of such collection, and the collective benefits to the research mission and the practice of education. That's a lot for a single hour's discussion – but even if we have a more meta-level discussion about the different activities that are going on in our highly decentralized University should prove useful, as it will at least raise our awareness of how we're working through some of these questions at Berkeley and the UC System.
When: Thursday, 5 October from 12 - 1pm
Prior to the meeting, please review:
Optionally (and perhaps slightly off-topic), Bill suggests "You are the Product" (John Lanchester, London Review of Books, August 2017)
Presenting/Facilitating: Bill Allison, Campus CTO
Amy Neeser, RDM / Research IT & Library
Andrew Wiedlea, LBNL
Camille Crittenden, CITRIS
Cathryn Carson, History/BIDS/DSEP
Chris Hoffman, Research IT
Dale Engle, EDW
Deb McCaffrey, Research IT
Ian Vaino, LBNL
Jason Christopher, Research IT
Jean Cheng, ETS/AIS
Kali Armitage, IST-Doc Mgmt
Kevin Chan, ETS
Liso Ho, Campus Privacy Officer
Maurice Manning, Research IT
Megan Leavitt, ETS
Michael Cheng, EDW
Miles Lincoln, ETS
Noah Wittman, ETS
Oliver Heyer, ETS
Owen McGrath, ETS
Patrick Schmitz, Research IT
Quinn Dombrowski, Research IT
Rick Jaffe, Research IT
Steve Masover, Research IT
Steven Williams, ETS
Lisa Ho: Educause article [https://er.educause.edu/articles/2017/7/naked-in-the-garden-privacy-and-the-next-generation-digital-learning-environment]; talk tomorrow at the LMS conference/event
Oliver: Interoperability between components of our LMS is actually hard, not seamless, and the information being shared between components is minor. But the data has 'always' been collected, it's just a question of whether/how/with what consent we move the data into a location in which it can be analyzed and acted on. Some data is (or will be) integral to the business of the university, and participating in the university necessitates participation in that aspect of its data-driven business. Other "experiments": opt-in or opt-out.
Chris: DIY solutions are the current practice/standard for researchers in determining where, how to store research data. This is a problem the Research Data Management (RDM) program is grappling with. A white paper on this topic is a WIP. Platforms, tools, policy all strands of the discussion.
Patrick: Some domains are out ahead of the regulation. Genomic data is not legally considered PII, but in actual fact it is not anonymizable.
Steve M: Human subjects experiment analogy: once our students (or staff) are in the data lake, they're in an experiment. What kind of meaningful consent can we request/obtain in a world where no one scrolls through EULAs?
Patrick: Can we offer meaningful choice when there's a huge educational gap: people (even we IT people) don't understand or fully-understand the implications of their choice.
Andrew: Consider use of energy generated by nuclear power: if I've used that energy, am I responsible for that number of grams of nuclear waste? Can't educate the way out of that problem. Can't expect everyone to knowledgably decide. Requires an authority / ombuds role to be provided by the institution.
Oliver: At the end of the day, each one of these privacy and data collection questions is a use case. How granular should we get in seeking permission for a given use-case or group of use cases.
Catherine: Possible to fall into infinite loops in considering this, given the uncertainty of the future. But Berkeley is in a great position to be considering and beginning to act on these questions. The technologists we have, the faculty who are themselves technologists, the DSEP program that is a platform for education (discussions, solutions). The only thing we can do is get to work on this sprawling problem.
Rick: Basic requirements for network-connected devices. Researchers don't like to hear that they "must" do these ten (or however many) things. Why not teach that basic level of having hardware up to snuff in DSEP or other broadly-inclusive course context.
Oliver: Data Lake. There's a clear need for a "data lake" because of the kinds of data we're collecting on students, and our first use case to do with student advising. But because we're doing this in that context doesn't determine an answer for the campus more broadly and in other contexts.
Patricks: data event "Data Dialogs" at Information School M 10/23: https://datadialogs.ischool.berkeley.edu/
Maurice: effective frameworks for managing data as it moves between apps: data policy applied to data in transit