Watch this presentation and learn about Kubernetes Networking:
How to build applications without knowing subnets & IP addresses and build modern cloud-friendly applications in an agile fashion.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Webinar- Tea for the Tillerman
1. v
Tea For The Tillerman
Building a Pure L3 Fabric For Kubernetes Networking
Kelsey Hightower & Dinesh G Dutt
19 April 2016
2. Key Takeaways
Modern application design has evolved to ignore
antediluvian ideas for service deployment,
discovery and advertisement
Kubernetes is an easy, scalable solution to
deploying applications in the modern DC
Routing on the host makes Kubernetes
deployments optimal
April 21, 2016 cumulusnetworks.com 2
3. April 21, 2016 cumulusnetworks.com 3
•Applications and Servers are the last
bastion of bridging
4. How Bridging Plays A Role in Application Design
Service or node discovery relies on broadcast
Cluster heartbeat uses multicast
Assumptions about being in a single subnet
VM Mobility continued this trend
April 21, 2016 cumulusnetworks.com 4
5. Reasons Why Bridging Is How Compute Folks Think About Networks
In the bad old days, IP routing was a low performance and
high cost solution since L2 switching was done in hardware
Vendors still charge extra for L3 licenses on the same box:
BGP costs even more money than OSPF
No good routing protocol stack on the host
L3 considered complex to configure and troubleshoot
compared to (mythical) L2 which was plug-and-play
April 21, 2016 cumulusnetworks.com 5
6. Open Networking
April 21, 2016 cumulusnetworks.com 6
Merchant switching silicon can
perform Bridging, and IP routing at
same performance and price
Open Networking solutions such as
Cumulus Linux offer routing at
same price point as bridging
7. Routing Protocol Suite on Host
Many high quality open source routing suites now
available for the host
Cumulus Quagga
BIRD
ExaBGP
Also commercial offerings are coming in:
Windows Server 2012
April 21, 2016 cumulusnetworks.com 7
8. Simplifying Routing
Solutions such as OSPF Unnumbered, BGP
Unnumbered coupled with automation
dramatically simplify routing
April 21, 2016 cumulusnetworks.com 8
9. April 21, 2016 cumulusnetworks.com 9
•OK, So How Are Modern Applications
Designed If We Have a Pure L3 Network
?
11. Google Cloud Platform
Google has been developing
and using containers to
manage our applications for
over 12 years.
Images by Connie Zhou
12. Google Cloud Platform
Everything at Google runs in
containers:
• Gmail, Web Search, Maps, ...
• MapReduce, batch, ...
• GFS, Colossus, ...
• Even Google’s Cloud Platform:
our VMs run in containers!
13. Google Cloud Platform
But it’s all so different!
• Deployment
• Management, monitoring
• Isolation (very complicated!)
• Updates
• Discovery
• Scaling, replication, sets
A fundamentally different way of
managing applications requires
different tooling and abstractions
Images by Connie Zhou
14. Google Cloud Platform
Kubernetes
Greek for “Helmsman”; also the root of the
words “governor” and “cybernetic”
• Manages container clusters
• Inspired and informed by Google’s
experiences and internal systems
• Supports multiple cloud and bare-metal
environments
• Supports multiple container runtimes
• 100% Open source, written in Go
Manage applications, not machines
18. Google Cloud Platform
Goal: Avoid vendor lock-in
Runs in many environments, including
“bare metal” and “your laptop”
The API and the implementation are
100% open
The whole system is modular and
replaceable
Workload Portability
19. Google Cloud Platform
Goal: Write once, run anywhere*
Don’t force apps to know about concepts
that are cloud-provider-specific
Examples of this:
● Network model
● Ingress
● Service load-balancers
● PersistentVolumes
* approximately
Workload Portability
20. Google Cloud Platform
Goal: Avoid coupling
Don’t force apps to know about concepts
that are Kubernetes-specific
Examples of this:
● Namespaces
● Services / DNS
Workload Portability
22. Google Cloud Platform
Pods
Small group of containers & volumes
Tightly coupled
The atom of scheduling & placement
Shared namespace
• share IP address & localhost
• share IPC, etc.
Managed lifecycle
• bound to a node, restart in place
• can die, cannot be reborn with same ID
Example: data puller & web server
Consumers
Content
Manager
File
Puller
Web
Server
Volume
Pod
23. Google Cloud Platform
Volumes
Very similar to Docker’s concept
Pod scoped storage
Support many types of volume plugins
• Empty dir (and tmpfs)
• Host path
• Git repository
• GCE Persistent Disk
• AWS Elastic Block Store
• Azure File Storage
• iSCSI
• Flocker
• NFS
• GlusterFS
• Ceph File and RBD
• Cinder
• FibreChannel
• Secret, ConfigMap,
DownwardAPI
• Flex (exec a binary)
25. Google Cloud Platform
ReplicationControllers
A simple control loop
Runs out-of-process wrt API server
Has 1 job: ensure N copies of a pod
• if too few, start some
• if too many, kill some
• grouped by a selector
Cleanly layered on top of the core
• all access is by public APIs
Replicated pods are fungible
• No implied order or identity
ReplicationController
- name = “my-rc”
- selector = {“App”: “MyApp”}
- podTemplate = { ... }
- replicas = 4
API Server
How
many?
3
Start 1
more
OK
How
many?
4
27. Google Cloud Platform
Deployments
Goal: updates-as-a-service
• Rolling update is imperative, client-side
Deployment manages replica changes for you
• stable object name
• updates are configurable, done server-side
• kubectl edit or kubectl apply
Aggregates stats
Can have multiple updates in flight
Status: BETA in Kubernetes v1.2 ...
29. Google Cloud Platform
Namespaces
Problem: I have too much stuff!
• name collisions in the API
• poor isolation between users
• don’t want to expose things like Secrets
Solution: Slice up the cluster
• create new Namespaces as needed
• per-user, per-app, per-department, etc.
• part of the API - NOT private machines
• most API objects are namespaced
• part of the REST URL path
• Namespaces are just another API object
• One-step cleanup - delete the Namespace
• Obvious hook for policy enforcement (e.g. quota)
35. Google Cloud Platform
Kubernetes networking
IPs are routable
• vs docker default private IP
Pods can reach each other without NAT
• even across nodes
No brokering of port numbers
• too complex, why bother?
This is a fundamental requirement
• can be L3 routed
• can be underlayed (cloud)
• can be overlayed (SDN)
38. Google Cloud Platform
Network Isolation
Describe the DAG of your app, enforce it in
the network
Restrict Pod-to-Pod traffic or across
Namespaces
Designed by the network SIG
• implementations for Calico, OpenShift, Romana,
OpenContrail (so far)
Status: Alpha in v1.2, expect beta in v1.3
40. Google Cloud Platform
Network Plugins
Introduced in Kubernetes v1.0
• VERY experimental
Uses CNI (CoreOS) in v1.1
• Simple exec interface
• Not using Docker libnetwork
• but can defer to Docker for networking
Cluster admins can customize their installs
• DHCP, MACVLAN, Flannel, custom
net
Plugin
Plugin
Plugin
42. Google Cloud Platform
Services
A group of pods that work together
• grouped by a selector
Defines access policy
• “load balanced” or “headless”
Gets a stable virtual IP and port
• sometimes called the service portal
• also a DNS name
VIP is managed by kube-proxy
• watches all services
• updates iptables when backends change
Hides complexity - ideal for non-native apps
Client
Virtual IP
58. Google Cloud Platform
External Services
Services IPs are only available inside the
cluster
Need to receive traffic from “the outside world”
Builtin: Service “type”
• NodePort: expose on a port on every node
• LoadBalancer: provision a cloud load-balancer
DiY load-balancer solutions
• socat (for nodePort remapping)
• haproxy
• nginx
59. Google Cloud Platform
Ingress (L7)
Many apps are HTTP/HTTPS
Services are L3/L4 (IP + port)
Ingress maps incoming traffic to backend
services
• by HTTP host headers
• by HTTP URL paths
HAProxy, NGINX, AWS and GCE
implementations in progress
Now with SSL!
Status: BETA in Kubernetes v1.2
Client
URL Map
60. Google Cloud Platform
DNS
Run SkyDNS as a pod in the cluster
• kube2sky bridges Kubernetes API -> SkyDNS
• Tell kubelets about it (static service IP)
Strictly optional, but practically required
• LOTS of things depend on it
• Probably will become more integrated
Or plug in your own!
61. Google Cloud Platform
Community
Top 0.01% of all
Github projects
1200+ external
projects based on k8s
Companies
Contributing
Companies
Using
800+
unique contributors
63. April 21, 2016 cumulusnetworks.com 63
•Tea For The Tillerman
•Routing On the Host
64. Completing the Kubernetes Puzzle
How do we announce the routes required by
Kubernetes across pods ?
Run a routing protocol on the host
April 21, 2016 cumulusnetworks.com 64
65. April 21, 2016 65cumulusnetworks.com
What If Host Configuration Could Be As Simple
As…
neighbor eth0
redistribute connected
66. What Cumulus Quagga Will Be in 3.0
router bgp 65534
bgp router-id 10.10.1.1
neighbor eth0 interface remote-as external
redistribute connected
April 21, 2016 cumulusnetworks.com 66
67. More Details
Two ways to use BGP on the host:
Using Dynamic Neighbors
Using BGP Unnumbered
Use of ASN:
All servers use the same ASN
April 21, 2016 cumulusnetworks.com 67
68. BGP on Host: Dynamic Neighbors
ToR is configured with subnet from which clients
can connect
Clients initiate connection
Rest of operation is regular BGP
bgp listen range 10.0.0.0/24 peer-group SERVER bgp listen-
limit 8
April 21, 2016 cumulusnetworks.com 68
69. BGP on Host: Unnumbered Configuration
Connection to servers is not bridged, but p2p
Pure L3
Interface-based configuration with remote-as
external
April 21, 2016 cumulusnetworks.com 69
70. And for the OSPF Afficianados
interface eth0
ip ospf area 0.0.0.1
router ospf
ospf router-id 10.10.1.1
area 0.0.0.1 stub no-summary
passive interface docker0
April 21, 2016 cumulusnetworks.com 70
71. Seat Belts With Routing On The Host
Hosts are always stub networks, never transit
Hosts are in separate area from rest of network with OSPF
Announce only default route to host
Accept only specified prefixes from host
April 21, 2016 cumulusnetworks.com 71
72. Customers Running Cumulus Quagga on the Host
All container-based apps
One mid-size customer is running with OSPF
One small-mid size customer is running with BGP
Unnumbered
One mid-to-large size customer is running with BGP
300+ Openstack cluster with VxLAN and Routing To
The Host
Multiple other customers in PoC or pre-production
April 21, 2016 cumulusnetworks.com 72
74. Building Pure L3 Fabrics is real
Networks, Compute and Applications are showing how to do
this
Standards-based, robust, scalable design
Kubernetes provides a framework for deploying
containerized networks
Its what Google pushed out after years of internal
deployment
High quality open source routing stacks available for
hosts
April 21, 2016 cumulusnetworks.com 74