Sent as e-mail to listserve, 2014-08-07
We're ready to ramp back up into the Fall semester with a reading group facilitated by Erik McCroskey of IST's Network Operation Services group. Erik is building out a Science DMZ for transport of large research data sets. Those who attended the OneIT Summit in June might have met Erik at his Science DMZ poster, which was subtitled "Enabling extreme data for research and education."
Erik asks that prior to the meeting we read the following sections of the ESNet (Energy Sciences Network) Science DMZ web pages:
==> ... and other sections to the degree your interest motivates.
Looking forward to seeing you on Thursday 8/14 at noon, in Rm. 200C Warren Hall.
As usual, those without keycard access to Warren should take the elevator to the 2nd floor; if no one is at the reception window, look for a sign with a phone number to call to be let through the locked doors.
Please feel free to forward this e-mail to interested colleagues, who can sign-up to the list (self-service) here.
[Next meeting: Cliff will lead the group on 8/28 in a discussion about CENIC services for research. He'll talk about Software Defined Networking (SDN), at a higher level than the discussion Erik is leading today.]
Erik (with some additions from Steve Miller):
* Installing UCB's Science DMZ now, funded by CCNIE grant. Slide showing UCB implementation:
* DTN -- who ought to run this? Higher on the stack than network engineers generally work.
* To be discovered: who needs this, how do we provide it, what else do they need?
* DMZ extend into data center, e.g., to connect to HPC cluster(s) hosted there
* 40GB connection to servers is about max at present; 10GB more common; no option now for 100GB network cards for servers. Also, 100GB links are expensive (e.g., $75K for a transponder at the border router)
* "friction free" dedicated network path ... not every researcher will use 100GB, but to have a pipe that multiple researchers could utilize at a lower rate, e.g., 10GB
* OpenFlow: defines how traffic is forwarded on a network. Normal routing is based on destination, or on reading labels from a table; OpenFlow gives programmatic ability to manipulate the forwarding table.
* Segregation of DMZ isolates research experimentation that might cause network problems from the institutions regular network/business
* DTN: Keeps TCP block in memory, which (somehow) sidesteps the steep cliff of network throttling that occurs when even a small amount of packet loss occurs on a TCP network. Tune TCP to have larger windows and buffering, multiple TCP flows at once. Multiple flows allows one to drop a packet in one flow w/o dropping transfer rate in another.
* Want to find researchers who are now using FedEx and hard drives to pass data around. Recognize that this is a practical solution for folks who are not network engineers. Our challenge is to provide an alternative that is effective, less expensive, more convenient.
Cliff: Is this store and forward? Need for unix login on DTNs at both ends?
Erik: Not sure until it's implemented. Globus requires only a client-end setup, but someone has to do something at each end to get data to/from the DTNs at the endpoints. What we'd like to do is present a simple user interface to researchers, without requiring them to learn anything new. We'll need to get our Science DMZ up and running, and do some experimenting before we can really say much about how it will work.
Patrick: LBNL clients already using Globus, shouldn't be hard to find use cases.
[discussion about Savio filesystem connection to Globus/DTN]
James: SSL has folks who have data they mail around on hard drives. Site in Puerto Rico, for example.
Erik: Let's explore. Could be an issue that there's not fast infrastructure on the source end of the data.
Patrick: Network to labs producing high volume of data?
Erik: Sometimes works to pipe data over campus network. But case-by-case. I don't think DMZ will be a cookie-cutter solution. Consult w/ researchers to find out the range of their computing needs, figure out what's a right solution to fit their particular needs. Some on campus have 1GB connection, can't saturate it; giving them 10GB won't solve their problems.
Chris: Science Engagement Methodology @ LBNL? ESNet. Greg Bell directs group.
Erik: Have been in touch with them. CENIC has been doing science engagement as well.
Cliff: Yes, that's right.
Erik: So lots of people to work with, including Research IT.
Patrick: Governance for SDN (which solves an allocation problem). First of all, is it correct that there will need to be governance for this reason.
Erik: A problem I'd love to have. Would like to see fuller pipes. Though it could be that the network can't accept the demand that nodes might be trying to put on the wire. Idea of Science DMZ is to provide a huge enough pipe that such situations won't occur. Another reason we may not be seeing limits is that researchers give up when transfer over network doesn't work (then revert to transporting physical disk drives, etc.).
Patrick: So a solution is to stay ahead of contention.
Erik: That's been the goal in designing the network from the start. Intention is to only get to 30-40% capacity before we talk about adding another fat link.
Patrick: Can there be a reservation model?
Erik: Can be. So in that sort of model, you have a means by which the technology tells you when a slot is open.
Cliff: GENIE consortium has mechanism for asking for allocation on Internet2 AL2S. But the point of some of the networks we're discussing (COTN = California Openflow Testbed Network; Internet2 AL2S) is explicitly at the breakable edge of what's possible in networking bandwidth.
Steven Abrams: CDL recently asked to accept 100TB of data. We're interested in how widely implemented (and where) these kinds of bandwidth capabilities are installed.
Chris: UCB just put together a proposal w/ 6 campuses (including Irvine & San Diego) who have CCNIE grants. Larry Smarr put this together: what's the effect of using these capabilities to facilitate research that crosses campus borders within the UC system? If it's awarded it will start in January.
Steve Miller: ESNet has a list of Science DMZ implementations.
Cliff: NSF to continue this grant program. Reaching out beyond R1 Universities.
Erik: Good interchange between campuses' network people. Call every 2 weeks or so among folks working on CENIC grants to discuss work, issues, problem solving.
Aaron: Connection to public cloud question, how does this intersect with ScienceDMZ
Erik: We will have dedicated 10GB link to Amazon (Seattle, to get to Oregon region of Amazon's cloud, which is where most of our users are setting up resources because it's cheaper and Amazon encourages it -- better margins for them -- direct connection to local region wasn't what was needed). Not sure that will give us better performance, as CENIC connection is quite good. But it will be less expensive, and it will permit connection from private address space to private addressing in the cloud. Looking at the diagram, the Amazon direct link will connect to the Border router.
Erik: But, again, just saying "here's a connection" won't solve researchers' problems. So I'm eager to find use cases.
Chris: Is time frame driven by use cases?
Erik: Not a timeframe. We're just gradually building it out.
Raymond: I have a use case. Workshop on processing Amazon data in the NE. 260TB or so. One way is to do computation on AWS. But do we have infrastructure on campus to use Savio HPC cluster to compute on data in NE, or must I pull all the data over before computing on it.
Erik: I think either scenario will work. Depends on whether Amazon service is set up for research (fast enough, low enough latency; 60ms or so to round-trip across the country). Cost of transferring out of Amazon is high.
Raymond: Public buckets. No charge to pull that data down.
Erik: So you could do that, store it (even temporarily, and download again when needed).
Patrick: Probably not fast to download public data
Erik: would be interesting to see speed, experiment.
Steve Miller / Erik: Perfsonar point-to-point tests possible for end users to do self-service testing. If they find performance is not adequate to need, network engineers can get involved and find where bottleneck is.
Erik: Network engineers rarely get called in for a problem. Maybe they don't know who to ask, maybe they don't believe that they'll get help if they do ask. Sometimes we see problems flaring up on Reddit before we get a call through the service desk.
Erik: Would like to see Research IT use Network group as a resource.
Cliff: CENIC would like to back that up too.
Chris: And the other way around. When research needs are discovered by network engineers we'd like to hear about it.
Steven Abrams: One point of discovery of needs is during creation of Data Management Plans. E.g., using CDL's DMP tool.
Patrick: Interesting idea re: having data transfer needs noted in the DMPs.
Aaron: And when researchers are filling out their DMP, can they be made aware of campus resources that might address their needs
Steven: Yes. And campuses are making use of this capability.