This document summarizes Zalando's use of Kubernetes on AWS. It discusses how Zalando uses Kubernetes across 200+ engineering teams, 30+ production clusters, and 8 non-production clusters. Key aspects covered include the architecture with isolated AWS accounts per product, cluster provisioning via CloudFormation, deployment via Jenkins pipelines, ingress controller, integration with AWS services, cluster autoscaling, OAuth integration, operations, monitoring with ZMON, and open sourcing of tools like the Kubernetes ingress controller and external DNS. Overall experience has been positive but missing best practices have been challenging and community/user discussions have been very helpful.
4. 4
WE OFFER A SUCCESSFUL AND CURATED ASSORTMENT
~200,000
articles from
>1,500
international brands
17 private
labels
HIGHLY
EXPERIENCED
category management
>350
designers
& stylistsLOCALIZATION
of the assortment
CURATED
SHOPPING
with Zalon
5. 5
ZALANDO
15 markets
6 fulfillment centers
20 million active customers
3.6 billion € net sales 2016
165 million visits per month
12,000 employees in Europe
6. 6
OUR FOOTPRINT AROUND EUROPE
as at Dec 2016
10
9
8
7
6
5
3
2
1
11
1
2
3
4
5
6
7
8
9
10
11
12
13
12
13
4
BERLIN HEADQUARTERS / TECH HUB
BRIESELANG FULFILLMENT CENTER
ERFURT FULFILLMENT CENTER
MÖNCHENGLADBACH FULFILLMENT CENTER
LAHR FULFILLMENT CENTER
DORTMUND TECH HUB
FRANKFURT OUTLET
DUBLIN TECH HUB
HELSINKI TECH HUB
STRADELLA FULFILLMENT CENTER
KÖLN OUTLET
MOISSY-CRAMAYEL FULFILLMENT CENTER
GRYFINO (start autumn 2017) FULFILLMENT CENTER
8. 8
KUBERNETES ON AWS: CONTEXT
200 engineering teams
30 prod. clusters
AWS
Dockerized apps
No manual operations
Reliability
Autoscaling
Seamless migration
9. 9
CURRENT STATUS
• First service in prod. on Kubernetes since Nov 2016
• 8 production clusters
• 8 non-production clusters
• “Early Access” phase (onboarding of individual teams)
13. 13
ARCHITECTURE DECISIONS
• One prod. cluster per AWS account / “product”
• API server behind SSL ELB, OAuth webhook
• Read only access to production
• CI/CD for write access
• etcd running separately on EC2
• Multi AZ clusters
24. 24
CLUSTER AUTOSCALING
Control # of worker nodes in ASG:
• Satisfy all resource requests
• One spare node per AZ
• No manual config “tweaking”
• Scale down, but not too fast
https://github.com/hjacobs/kube-aws-autoscaler
26. 26
OAUTH INTEGRATION
• App declares needed credentials
via Kubernetes Third Party Resource
• OAuth client/tokens are provisioned
as Kubernetes secrets
31. 31
MONITORING
• Each cluster contains ZMON appliance
• K8s resources are available as ZMON entities
• Users can create app checks/alerts via UI
https://zmon.io/
34. 34
OPEN SOURCE
Kube AWS Ingress Controller
https://github.com/zalando-incubator/kube-ingress-aws-controller
External DNS
https://github.com/kubernetes-incubator/external-dns
Zalando Cluster Config & Docs
https://github.com/zalando-incubator/kubernetes-on-aws
more to come...
35. 35
OUR EXPERIENCE SO FAR
• Missing best practices are a big pain point
• Community is great and welcoming
• Talking with other users is essential
• Slack channels are “compensating” lack of docs
36. 36
LIST OF KUBERNETES ON AWS USERS
If you are using Kubernetes on AWS,
please fill out the Google form:
https://github.com/hjacobs/kubernetes-on-aws-users
+