2. Google confidential │ Do not distribute
Kubernetes: a platform for
automating deployment,
scaling, and operations
WSO2Con 2015
Brian Grant
@bgrant0607
4. Google confidential │ Do not distribute
Old way: install applications on host
kernel
libs
app
app app
Application and OS share filesystem
Use OS distribution package manager
Entangled with each other and with host
• Executables
• Configuration
• Shared libraries
• Process and lifecycle management
Immutable VM images provide predictable
rollouts and rollbacks
• but are not portable and heavyweight
app
5. Google confidential │ Do not distribute
New way: deploy containers
libs
app
kernel
libs
app
libs
app
libs
app
OS-level virtualization
Isolated, from each other and from the host
• filesystems
• processes
• resources
Small and fast ⇒ enables 1:1 app to image
• Unlocks benefits of microservices
• Decouple build (Dev) from deployment (Ops)
• Consistency from development to production
• Portable across OS distros and clouds
• Application-centric management
6. Google confidential │ Do not distribute
Need container-centric infrastructure
Scheduling: Decide where my containers should run
Lifecycle and health: Keep my containers running despite failures
Scaling: Make sets of containers bigger or smaller
Naming and discovery: Find where my containers are now
Load balancing: Distribute traffic across a set of containers
Storage volumes: Provide data to containers
Logging and monitoring: Track what’s happening with my containers
Debugging and introspection: Enter or attach to containers
Identity and authorization: Control who can do things to my containers
7. Google confidential │ Do not distribute
Want to automate orchestration for velocity & scale
Diverse workloads and use cases demand still more functionality
• Rolling updates and blue/green deployments
• Application secret and configuration distribution
• Continuous integration and deployment
• Workflows
• Batch processing
• Scheduled execution
• Application-specific orchestration
…
A composable, extensible Platform is needed
8. Google confidential │ Do not distribute
Kubernetes
Greek for “Helmsman”; also the root of the
words “governor” and “cybernetic”
• Infrastructure for containers
• Schedules, runs, and manages containers
on virtual and physical machines
• Platform for automating deployment,
scaling, and operations
• Inspired and informed by Google’s
experiences and internal systems
• 100% Open source, written in Go
9. Google confidential │ Do not distribute
Deployment
$ kubectl run my-nginx --image=nginx
replicationcontroller "my-nginx" created
$ kubectl get po
NAME READY STATUS RESTARTS AGE
my-nginx-wepbv 1/1 Running 0 1m
10. Google confidential │ Do not distribute
Scaling
$ kubectl scale rc my-nginx --replicas=2
replicationcontroller "my-nginx" scaled
$ kubectl get po
NAME READY STATUS RESTARTS AGE
my-nginx-wepbv 1/1 Running 0 1m
my-nginx-yrf3u 1/1 Running 0 20s
11. Google confidential │ Do not distribute
Shutdown
$ kubectl delete rc my-nginx
replicationcontroller "my-nginx" deleted
$ kubectl get po
NAME READY STATUS RESTARTS AGE
my-nginx-wepbv 0/1 Terminating 0 4m
my-nginx-yrf3u 0/1 Terminating 0 3m
$ kubectl get po
$
13. Google confidential │ Do not distribute
users control plane nodes
Kubernetes architecture
CLI
API
UI
kubelet
kubelet
kubelet
apiserver
scheduler
controllers
14. Google confidential │ Do not distribute
Post desired state (aka spec) via API
kubelet
kubelet
kubelet
Run nginx
Replicas = 2
CPU = 2.5
Memory = 1Gi
apiserver
scheduler
controllers
15. Google confidential │ Do not distribute
Placement (aka scheduling)
kubelet
kubelet
kubelet
apiserver
scheduler
controllers
Which nodes
for nginx ?
16. Google confidential │ Do not distribute
Assignment (aka binding)
kubelet
kubelet
kubelet
Run
nginx
apiserver
scheduler
controllers
Run
nginx
17. Google confidential │ Do not distribute
Fetch container image
kubelet
kubelet
kubelet
Registry
Pull
nginx
Pull
nginx
apiserver
scheduler
controllers
18. Google confidential │ Do not distribute
Execution and lifecycle management
kubelet
kubelet
kubelet
Status
nginx
nginx
nginx
apiserver
scheduler
controllers
Status
nginx
19. Google confidential │ Do not distribute
Get current status via API
kubelet
kubelet
kubelet
GET
nginx
apiserver
scheduler
controllers
nginx
nginx
20. Google confidential │ Do not distribute
kubelet
kubelet
kubelet
Status
nginx
apiserver
scheduler
controllers
nginx
nginx
Get current status via API
21. Google confidential │ Do not distribute
Kubernetes uses the same APIs as users
kubelet
kubelet
kubelet
apiserver
scheduler
controllers
22. Google confidential │ Do not distribute
Modularity
Modularity facilitates
• composability
• extensibility
APIs - no shortcuts or back doors
• ensures extensions are on equal footing
Example: Scheduler
Example: Controllers
23. Google confidential │ Do not distribute
Control loops
Drive current state → desired state
Observed state is truth
Act independently
• choreography rather than
orchestration
Recurring pattern in the system
Example: Scheduler
Example: Controllers
observe
diff
act
26. Google confidential │ Do not distribute
Pods
Small group of containers & volumes
Tightly coupled
• the atom of replication & placement
“Logical” host for containers
• each pod gets an IP address
• share data: localhost, volumes, IPC, etc.
Facilitates composite applications
• mix and match components, languages, etc.
• preserves 1:1 app to image
Example: data puller & web server
Consumers
Content
Manager
File
Puller
Web
Server
Volume
Pod
27. Google confidential │ Do not distribute
Volumes
Storage automatically attached to pod
• Local scratch directories created on demand
• Cloud block storage
• GCE Persistent Disk
• AWS Elastic Block Storage
• Cluster storage
• File: NFS, Gluster, Ceph
• Block: iSCSI, Cinder, Ceph
• Special volumes
• Git repository
• Secret
Critical building block for higher-level
automation
28. Google confidential │ Do not distribute
Secrets
How to grant a pod access to a secured
something?
• secrets: credentials, tokens, passwords, ...
• don’t put them in the container image!
12-factor says should come from the
environment
Inject them as “virtual volumes” into Pods
• not baked into images nor pod configs
• kept in memory - never touches disk
• not coupled to non-portable metadata API
Manage secrets via the Kubernetes API
Node
Pod Secret
API
29. Google confidential │ Do not distribute
User-provided key-value attributes
Attached to any API object
Generally represent identity
Queryable by selectors
• think SQL ‘select ... where ...’
The only grouping mechanism
Labels
30. Google confidential │ Do not distribute
app: my-app
track: stable
tier: FE
app: my-app
track: canary
tier: FE
app: my-app
track: stable
tier: BE
app: my-app
track: canary
tier: BE
Selectors
31. Google confidential │ Do not distribute
app = my-app
Selectors
app: my-app
track: stable
tier: FE
app: my-app
track: canary
tier: FE
app: my-app
track: stable
tier: BE
app: my-app
track: canary
tier: BE
32. Google confidential │ Do not distribute
app = my-app, tier = FE
Selectors
app: my-app
track: stable
tier: FE
app: my-app
track: canary
tier: FE
app: my-app
track: stable
tier: BE
app: my-app
track: canary
tier: BE
33. Google confidential │ Do not distribute
app = my-app, tier = BE
Selectors
app: my-app
track: stable
tier: FE
app: my-app
track: canary
tier: FE
app: my-app
track: stable
tier: BE
app: my-app
track: canary
tier: BE
34. Google confidential │ Do not distribute
Selectors
app = my-app, track = stable
app: my-app
track: stable
tier: FE
app: my-app
track: canary
tier: FE
app: my-app
track: stable
tier: BE
app: my-app
track: canary
tier: BE
35. Google confidential │ Do not distribute
app = my-app, track = canary
Selectors
app: my-app
track: stable
tier: FE
app: my-app
track: canary
tier: FE
app: my-app
track: stable
tier: BE
app: my-app
track: canary
tier: BE
37. Google confidential │ Do not distribute
ReplicationControllers
Ensures N copies of a Pod
• if too few, start new ones
• if too many, kill some
• grouped by a label selector
Explicit specification of desired scale
• client doesn’t just create N copies
• enables self-healing
• facilitates auto-scaling
An example of a controller
• calls public APIs
ReplicationController
- selector = {“app”: “my-app”}
- template = { ... }
- replicas = 4
API Server
How
many?
3
Start 1
more
OK
How
many?
4
38. Google confidential │ Do not distribute
Services
A group of pods that work together
• grouped by a label selector
Publishes how to access the service
• DNS name
• DNS SRV records for ports (well known ports work, too)
• Kubernetes Endpoints API
Defines access policy
• Load-balanced: name maps to stable virtual IP
• “Headless”: name maps to set of pod IPs
Hides complexity - ideal for non-native apps
Decoupled from Pods and ReplicationControllers
Virtual IP
Client
39. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 3
- selector:
- app: my-app
- version: v1
Service
- app: my-app
$ kubectl rolling-update
my-app-v1 my-app-v2
--image=image:v2
Live-update an application
40. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 3
- selector:
- app: my-app
- version: v1
ReplicationController
- replicas: 0
- selector:
- app: my-app
- version: v2
Service
- app: my-app
41. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 3
- selector:
- app: my-app
- version: v1
ReplicationController
- replicas: 1
- selector:
- app: my-app
- version: v2
Service
- app: my-app
42. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 2
- selector:
- app: my-app
- version: v1
ReplicationController
- replicas: 1
- selector:
- app: my-app
- version: v2
Service
- app: my-app
43. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 2
- selector:
- app: my-app
- version: v1
ReplicationController
- replicas: 2
- selector:
- app: my-app
- version: v2
Service
- app: my-app
44. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 1
- selector:
- app: my-app
- version: v1
ReplicationController
- replicas: 2
- selector:
- app: my-app
- version: v2
Service
- app: my-app
45. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 1
- selector:
- app: my-app
- version: v1
ReplicationController
- replicas: 3
- selector:
- app: my-app
- version: v2
Service
- app: my-app
46. Google confidential │ Do not distribute
Rolling Updates
ReplicationController
- replicas: 0
- selector:
- app: my-app
- version: v1
ReplicationController
- replicas: 3
- selector:
- app: my-app
- version: v2
Service
- app: my-app
48. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
49. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
50. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
51. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
52. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
53. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
54. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
55. Google confidential │ Do not distribute
Jobs
Manages pods that run to completion
• differentiates number running at any one
time from the total number of completed
runs
Similar to ReplicationController, but for
pods that don’t always restart
• workflow: restart on failure
• build/test: don’t restart on app. failure
Principle: do one thing, don’t overload
Status: BETA in Kubernetes v1.1
Job
- parallelism: 3
- completions: 6
- selector:
- job: my-work
56. Google confidential │ Do not distribute
DaemonSets
Runs a Pod on every node
• or a selected subset of nodes
Not a fixed number of replicas
• created and deleted as nodes come and go
Useful for running cluster-wide services
• logging agents
• storage systems
DaemonSet manager is both a controller
and scheduler
Status: ALPHA in Kubernetes v1.1
57. Google confidential │ Do not distribute
Deployment
Rollouts as a service
• updates to pod template will be
rolled out by controller
• can choose between rolling update
and recreate
Enables declarative updates
• manipulates replication controllers
and pods so clients don’t have to
Status: ALPHA in Kubernetes v1.
1
Deployment
- strategy: {type: RollingUpdate}
- replicas: 3
- selector:
- app: my-app
...
59. Google confidential │ Do not distribute
Take away
• Decoupling applications from infrastructure creates new opportunities
• Kubernetes
• is container-centric infrastructure
• which includes a lot more than just running containers
• facilitates management of containers in production
• provides a foundation for building a workload-management ecosystem
• This has enabled Platform as a Service systems to be built on Kubernetes
• Apache Stratos
• Openshift 3: co-designed and co-developed with Kubernetes
• Deis: Heroku-inspired Docker-based PaaS
• Gondor: Python-aaS
60. Google confidential │ Do not distribute
Kubernetes is Open
- open community
- open design
- open source
- open to ideas
http://kubernetes.io
https://github.com/kubernetes/kubernetes
slack: kubernetes
twitter: @kubernetesio
62. Google confidential │ Do not distribute
Design principle summary
Declarative > imperative: State your desired results, let the system actuate
Control loops: Observe, rectify, repeat
Simple > Complex: Try to do as little as possible
Modularity: Components, interfaces, & plugins
Legacy compatible: Requiring apps to change is a non-starter
Network-centric: IP addresses are cheap
No grouping: Labels are the only groups
Cattle > Pets: Manage your workload in bulk
Open > Closed: Open Source, standards, REST, JSON, etc.