Our next Research IT Reading Group will continue our discussions about the evolving storage landscape for campus research data. At this discussion, we will focus on recent developments underway here at UC Berkeley led by the IST Storage & Backup team in partnership with our collaborators at LBNL. John White (Storage Architect for the BRC Savio cluster and a member of LBNL's High Performance Computing Services team) and Jack Shnell (Storage Administrator in IST's Storage & Backup group) will present their experiences developing parallel file system solutions, especially GPFS. They will describe recent developments for the BRC Savio cluster and some experimental work on an archival storage service with the Library. Parallel file systems have significant potential for advancing storage solutions for research, and we look forward to discussing the possibilities.
When: Thursday, February 26 from noon - 1pm
Where: 200C Warren Hall, 2195 Hearst St (see building access instructions on parent page).
Event format: The reading group is a brown bag lunch (bring your own) with a short ~20 min talk followed by ~40 min group discussion.
Please review the following in advance of the 2/26 meeting:
==> Enhancing High-Performance Computing Clusters with Parallel File Systems, Dell Power Solutions (2005)
==> Parallel File Systems, slide deck from a guest lecture by Samuel Lang at Argonne National Laboratory (September 2010), especially slides 1-9 (the rest are more technical)
==> storage.berkeley.edu -- storage options currently offered by the IST Storage & Backup team
==> LTFS 3: Linear Tape File System and the Future of Tape Data Storage
==> Library Partitioning: A Perfect Case Study or LTFS
Aaron Culich, RIT
Aron Roberts, RIT
Bill Allison, IST
Camille Villa, RIT
Chris Hoffman, RIT
Dav Clark, BIDS/D-Lab
David Greenbaum, RIT
Gary Jung, RIT/LBNL
Greg Kurtzer, RIT/LBNL
Greg Merritt, California PATH/Inst of Transportation Studies
Jack Shnell, IST
James Gao, Jack Gallant Lab
John Lowe, RIT/Lingustics
John White, RIT/LBNL
Kai Song, LBNL
Michael Jennings, RIT/LBNL
Patrick Schmitz, RIT
Perry Willets, CDL
Quinn Dombrowski, RIT
Rick Jaffe, RIT
Ron Sprouse, Linguistics
Steve Masover, RIT
Storm Slivkoff, Jack Gallant Lab
In lieu of our next meeting, Reading Group participants are encouraged to attend a Communication/Collaboration Summit March 12th, 12:30-3pm, 370 Dwinelle
What are some of the key storage problems that you are facing with your (research, teaching and learning) data right now?
What kinds of data (and other content) do you have that you wish you could just store for the long term in some trusted place?
Are you facing new challenges moving large files and content around the network we call the internet? Across different kinds of storage? From local drives to computing facilities?
John White, “Parallel Files Systems”
(see slide deck, attached to this page as a PDF)
HPC application of file systems:
high client counts, high process counts, high capacity requirements
issues with NFS / CIFS / AFP / NAS:
single point of contact for both data and metadata
file based locking
does not allow parallelism
...but researchers don’t care about this. They want data available everywhere, hate transferring data, don’t want to learn new IO APIs, people aren’t aware their code is inefficient
researcher working with historic stock ticker data
5 GB / s aggregate performance
performance does not always scale linear
Popular parallel file systems:
Lustre: leading file solution for supercomputers, open source
GPFS: IBM, vendor licensed and supported. wide area support
Extra features on parallel file systems: tiered storage (storage pools), ILM, replication
GPFS-WAN: making data available at HPC facilities around the country
LTFS is a relatively new technology - utilizing a traditional tape library as a file system
When you request a file via the interface, there is a little bit of a lag...but with GPFS integration, it allows you to incorporate storage management into the model
Currently working with the library, looking for lower cost storage solutions...10 - 20% less than the most inexpensive spinning disk systems
NFS is different...every department has their own setup, which will be a challenge for designing a centralized system
how to implement a true enterprise technology into an infrastructure like our decentralized campus?
currently working with the College of Chemistry
new tape technology, shelf stable for up to 20 years.
working on a drag and drop interface, users will be able to use tape storage like any familiar file storage systems
Patrick: are any of the other UCs looking at this?
Jack: I think Davis is working on this for their medical center
Chris: do you worry about things like... departments using this for their department file server
Jack: Essentially, we don’t care...we’d like to help people reduce costs. It does depend on what kind of research they’re do, what they want to do. I wouldn’t recommend folks run a database on this. But if you’re storing backups of the database, then this works. It’s a new tool…
James: How do you see these techonlogies interacting with upcoming technology in cloud storage?
Jack and John: These aren’t built specifically to address those issues, but they can be “gatewayed” to cloud storage
Rick: Are users with inefficient code a big worry for you?
John: The more efficient we can make our users...we can reduce the amount of pain for other users. This is an inherently shared system.
Chris: what kind of content are you working with at the Library?
Jack: a lot of digitized photographs
Chris: we’ve got this problem with our museums and our directors are worried about how much storage and backup is costing
Jack: how do you deal with data ownership? intellectual property? people leaving the institution?
James: use case - we’re working with neuroscience data, traditional NFS amounts ...data protected under HIPAA - encrypted on our end before sent off to cold storage. we’ve been running our own cluster for a while, but we don’t have resources for backup...we have almost 80 TB for our own lab and that’s growing quickly. We built through the last 40 in a year and a half. We’re looking at storage solutions, Swift, in the cloud to grow storage to 150 TB. We want to shuffle data off, but we don’t know where to put it and whether it’ll be safe.