Container technology (like Docker) allows developers to easily deploy applications across a wide number of systems in a data center or cloud provider’s infrastructure, and has changed the way companies build and deploy software. Once developers start using containers in production environments, they soon need tools to help manage and orchestrate those containers to ensure they are reliable and scalable. Kubernetes is an open-source container orchestration and management platform that is emerging as a standard for this (according to analyst firm RedMonk, 71% of the Fortune 100 use containers, and more than 50% of those companies use Kubernetes).
Kubernetes (a.k.a. K8s) was originally developed at Google to manage the billions of containers then run in production. It is now an open-source project maintained by the Cloud Native Computing Foundation. It is also the core technology for orchestrating the infrastructure supporting the new Data Science courses on campus.
Research IT’s Maurice Manning will discuss the basic concepts of Kubernetes, and Yuvi Panda (infrastructure lead for the UCB Data Science Education Program) will discuss how they are using Kubernetes to scale JupyterHub for thousands of students.
When: Thursday, 21 September from 12 - 1pm
Where: 200C Warren Hall, 2195 Hearst St (see building access instructions on parent page).
What: Intro to Kubernetes and how it is being used to support Data Science Education
Presenting: Maurice Manning (Research IT) and Yuvi Panda (Data Science Education Program)
Prior to the meeting, please review:
For those who want to dig into a deeper set of materials, the following are recommended:
Presenting: Maurice Manning (Research IT); Yuvi Panda (DSEP)
Aaron Culich, Research IT
Amy Neeser, RDM (Library and Research IT)
Barbara Gilson, SAIT (emeritus)
Chris Paciorek, Statistics & BRC
Deb McCaffrey, Research IT / BRC
Evan Muzzall, DLab
Jason Christopher, Research IT
Jenn Stringer, RTL
John Crossman, ETS
John Felder, ETS
Kelly Armitige, IST-Doc
Kevin Chan, ETS
Krishna Muriki, LBNL / Research IT
Meagan Levitt, ETS
Owen McGrath, ETS
Patrick Schmitz, Research IT
Paul Kerschen, ETS
Quinn Dombrowski, Research IT
Ray Davies, ETS
Rick Jaffe, Research IT
Ron Sprouse, Linguistics
Ryan Lovett, Statistics
Sandeep Jayaprakash, ETS
Scott Peterson, Doe Library
Steve Masover, Research IT
Walter Stokes, IST-DB
* Strong recommendation for the Borg paper (linked above)
* Container orientation / definition / concepts / features / capabilities
* Some examples of YAML that deploys Kubernetes pods
* demonstration of a Data Science 8 notebook: click to have one open in a browser, and you're off into a Jupyter Notebook -- no installation, no need to understand what infrastructure is operating a Jupyter Notebook under the hood
* deployment demonstration using Kubernetes (cf. Zero to JupyterHub)
* helm -- apt for distributed systems
* "cattle not pets"
* to follow what's going on re: provisioning for DSEP ---> uc-jupyter.slack.com
Hands-on workshop to follow along with this provisioning exercise at AIS on 9 October: https://ais.berkeley.edu/events/zero-jupyterhub-hands-workshop/2017-10-09
Questions / Discussion
Chris P: Other containers than Docker controllable by Kubernetes?
Yuvi: Mostly Docker or Rocket. There's a standard interface, so can be any and there are additional containers that have implemented an interface for Kubernetes management.
Jenn: What does Kubernetes get you -- what benefit compared to 'manually' deploying these nodes & containers
Yuvi: Not worth it if you're running over one node; any more than that -- it's well worth it. Kubernetes costs about 1GB RAM. Less sys admin overhead in the long run, once one understands the tool and its use: Kubernetes needs updating every 6 mos or so; no paging needed when nodes die -- no need to page a sys admin; ability to give root privileges to non-staff sys admins (e.g., student Kubernetes cluster admins) -- can give access by namespace in pretty much any way you wish.
Jenn: Account management
Yuvi: JupyterHub is CalNet authenticated.
Aaron: Kubernetes could be quite easily via Google OAuth
Chris: What happens to storage volumes (pvc) when pod dies?
Yuvi: It persists, and is attached to the same user's new pod when it next goes live
Yuvi: Right now we're doing a purge every semester, zipping up content & emailing to student then purging
Yuvi: last semester, overestimated amount of SSD needed by students and our disk space ended up being more expensive than compute (because of its persistence)
Chris: How/when do you scale
Yuvi: When RAM resources are >80% saturated, spin up more nodes. Not by user usage, as this is limited by configuration.
Deb: Non-Jupyter resources provisionable?
Yuvi: Sure. PyCortex, R Studio, et al. are use cases associated with actual classes