Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
CNCF END USER
SIG-DX
2019-04-18
HENNING JACOBS
@try_except_
Developer
Experience
at Zalando
2
EUROPE’S LEADING ONLINE FASHION PLATFORM
3
ZALANDO AT A GLANCE
~ 5.4billion EUR
revenue 2018
> 250
million
visits
per
month
> 15.000
employees in
Europe
> 79%
of v...
4
Platform
> 1100
developers
> 200
development teams
5
YOU BUILD IT, YOU RUN IT
The traditional model is that you take your software to the
wall that separates development and...
6
ON-CALL: YOU OWN IT, YOU RUN IT
When things are broken,
we want people with the best
context trying to fix things.
- Bla...
7
KUBERNETES @ ZALANDO
114
clusters
1400~
nodes
Since
Oct 2016
Node
Autoscaling
From v1.4
to v1.12
Default
Deployment
Targ...
8
DEVELOPERS USING KUBERNETES
9
DEVELOPER JOURNEY
Consistent story
that models
all aspects of SW dev
10
Developer
Journey
11
Developer
Journey
Correctness
Compliance
GDPR
Security
Cost Efficiency
24x7 On Call
Governance
Resilience
Capacity
...
12
DEVELOPER PRODUCTIVITY
Code Build Test Deploy OperateSetup
Cloud Native Application Runtime
14
PLAN & SETUP
15
Plan
Stories
Rules of Play
Tech Radar
17
Setup
Application
Bootstrapping
20
BUILD & TEST
21
CDPGit
code
push
CONTINUOUS DELIVERY PLATFORM: BUILD
23
DEPLOY
24
Deploy
Kubernetes
25
DEPLOYMENT CONFIGURATION
├── deploy/apply
│ ├── deployment.yaml
│ ├── credentials.yaml # Zalando IAM
│ ├── ingress.yaml...
26
INGRESS.YAML
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your application should be exposed on
- host: ...
27
TEMPLATING: MUSTACHE
kind: Ingress
metadata:
name: "..."
spec:
rules:
# DNS name your application should be exposed on
...
28
CONTINUOUS DELIVERY PLATFORM
29
CDP: DEPLOY
"glorified kubectl apply"
30
CDP: OPTIONAL APPROVAL
31
STACKSET: TRAFFIC SWITCHING
github.com/zalando-incubator/stackset-controller
32
STACKSET CRD
apiVersion: zalando.org/v1
kind: StackSet
...
spec:
ingress:
hosts: ["foo.example.org"]
backendPort: 8080
...
33
TRAFFIC SWITCHING STEPS IN CDP
github.com/zalando-incubator/stackset-controller
34
EMERGENCY ACCESS SERVICE
Get emergency access by referencing existing Incident ticket:
zkubectl cluster-access request ...
35
INTEGRATIONS
36
CLOUD FORMATION VIA CI/CD
├── deploy/apply
│ ├── deployment.yaml # Kubernetes
│ ├── cf-iam-role.yaml # AWS IAM Role
│ ├...
37
ZALANDO IAM/OAUTH VIA CRD
apiVersion: zalando.org/v1
kind: PlatformCredentialsSet
..
spec:
application: my-app
tokens:
...
38
POSTGRES OPERATOR
Application to manage
PostgreSQL clusters on
Kubernetes
>700
clusters running
on Kubernetes
github.co...
Elasticsearch in Kubernetes
Elasticsearch
2.500 vCPUs
1 TB RAM
github.com/zalando-incubator/es-operator/
40
SUMMARY
• Application Bootstrapping
• Git as source of truth and UI
• 4-eyes principle for master/production
• Extensib...
41
DELIVERY PERFORMANCE METRICS
• Lead Time
• Release Frequency
• Time to Restore Service
• Change Fail Rate
https://srcco...
42
CONTAINERS
From "Accelerate: The Science of Lean Software and DevOps"
43
DELIVERY PERFORMANCE METRICS
• Lead Time
• Release Frequency
• Time to Restore Service
• Change Fail Rate
≙ Commit to P...
“.. means establishing empathy with internal
consumers (read: developers) and collaborating
with them on the design. Platf...
46
DEVELOPER SATISFACTION
47
DOCUMENTATION
"Documentation is hard to find"
"Documentation is not comprehensive enough"
"Remove unnecessary complexit...
48
DOCUMENTATION
• Restructure following
https://www.divio.com/en/blog/documentation/
• Concepts
• How Tos
• Tutorials
• R...
50
NEWSLETTER
"You can now.."
• You can now benefit from the most recent
Kubernetes 1.12 features, e.g. ..
• You can now a...
51
SIGNAL: ISSUE UPVOTES
52
TESTIMONIALS
“So, thank you, Team Automata, for listening to our
community, taking our upvotes in consideration when
de...
53
MONITORING
54
ZMON DASHBOARD
github.com/zalando/zmon
55
GRAFANA APPLICATION DASHBOARD
56
KUBERNETES RESOURCE REPORT
github.com/hjacobs/kube-resource-report
57
RESOURCE REPORT: TEAMS
Sorting teams by
Slack Costs
github.com/hjacobs/kube-resource-report
58
RESOURCE REPORT: APPLICATIONS
"Slack"
59
RESOURCE REPORT: CLUSTERS
github.com/hjacobs/kube-resource-report
"Slack"
60
UNDER THE HOOD
61
ZALANDO: DECISION
1. Forbid Memory Overcommit
• Implement mutating admission webhook
• Set requests = limits
2. Disable...
62
KUBERNETES CLUSTER SETUP
Master
Config
Worker
EC2
Instances
CloudFormation
Stacks
github.com/zalando-incubator/kubernet...
63
CLUSTER PROVISIONING
CLUSTER LIFECYCLE MANAGER (CLM)
ADMIN
create
apply manifests
provision
resources
create
CF stack
C...
64
INGRESS
https://github.com/zalando-incubator/kube-ingress-aws-controller
65
VPA FOR PROMETHEUS
apiVersion: poc.autoscaling.k8s.io/v1alpha1
kind: VerticalPodAutoscaler
metadata:
name: prometheus-v...
66
VERTICAL POD AUTOSCALER
limit/requests adapted by VPA
67
HORIZONTAL POD AUTOSCALING (CUSTOM METRICS)
Queue Length
Prometheus Query
Ingress Req/s
ZMON Check
github.com/zalando-i...
68
DOWNSCALING DURING OFF-HOURS
github.com/hjacobs/kube-downscaler
Weekend
69
DOWNSCALING DURING OFF-HOURS
DEFAULT_UPTIME="Mon-Fri 07:30-20:30 CET"
annotations:
downscaler/exclude: "true"
github.co...
70
KUBERNETES JANITOR
● TTL and expiry date annotations, e.g.
○ set time-to-live for your test deployment
● Custom rules, ...
71
JANITOR TTL ANNOTATION
# let's try out nginx, but only for 1 hour
kubectl run nginx --image=nginx
kubectl annotate depl...
72
CUSTOM JANITOR RULES
# require "app" label for new pods starting April 2019
- id: require-app-label-april-2019
resource...
73
EC2 SPOT NODES
72% savings
74
SPOT ASG / LAUNCH TEMPLATE
Not upstream in cluster-autoscaler (yet)
75
OPEN SOURCE
Kubernetes on AWS
github.com/zalando-incubator/kubernetes-on-aws
AWS ALB Ingress controller
github.com/zala...
76
MORE INFO
● DevOps Gathering 2019: Ensuring Kubernetes Cost Efficiency across (many) Clusters (slides)
● DevOpsCon Muni...
QUESTIONS?
HENNING JACOBS
HEAD OF
DEVELOPER PRODUCTIVITY
henning@zalando.de
@try_except_
Illustrations by @01k
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
Developer Experience at Zalando - CNCF End User SIG-DX
Nächste SlideShare
Wird geladen in …5
×

Developer Experience at Zalando - CNCF End User SIG-DX

636 Aufrufe

Veröffentlicht am

Presentation given on 2019-04-18 in the regular CNCF End User Developer Experience SIG call (Zoom).

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Developer Experience at Zalando - CNCF End User SIG-DX

  1. 1. CNCF END USER SIG-DX 2019-04-18 HENNING JACOBS @try_except_ Developer Experience at Zalando
  2. 2. 2 EUROPE’S LEADING ONLINE FASHION PLATFORM
  3. 3. 3 ZALANDO AT A GLANCE ~ 5.4billion EUR revenue 2018 > 250 million visits per month > 15.000 employees in Europe > 79% of visits via mobile devices > 26 million active customers > 300.000 product choices ~ 2.000 brands 17 countries
  4. 4. 4 Platform > 1100 developers > 200 development teams
  5. 5. 5 YOU BUILD IT, YOU RUN IT The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. - A Conversation with Werner Vogels, ACM Queue, 2006
  6. 6. 6 ON-CALL: YOU OWN IT, YOU RUN IT When things are broken, we want people with the best context trying to fix things. - Blake Scrivener, Netflix SRE Manager
  7. 7. 7 KUBERNETES @ ZALANDO 114 clusters 1400~ nodes Since Oct 2016 Node Autoscaling From v1.4 to v1.12 Default Deployment Target
  8. 8. 8 DEVELOPERS USING KUBERNETES
  9. 9. 9 DEVELOPER JOURNEY Consistent story that models all aspects of SW dev
  10. 10. 10 Developer Journey
  11. 11. 11 Developer Journey Correctness Compliance GDPR Security Cost Efficiency 24x7 On Call Governance Resilience Capacity ...
  12. 12. 12 DEVELOPER PRODUCTIVITY Code Build Test Deploy OperateSetup Cloud Native Application Runtime
  13. 13. 14 PLAN & SETUP
  14. 14. 15 Plan Stories Rules of Play Tech Radar
  15. 15. 17 Setup Application Bootstrapping
  16. 16. 20 BUILD & TEST
  17. 17. 21 CDPGit code push CONTINUOUS DELIVERY PLATFORM: BUILD
  18. 18. 23 DEPLOY
  19. 19. 24 Deploy Kubernetes
  20. 20. 25 DEPLOYMENT CONFIGURATION ├── deploy/apply │ ├── deployment.yaml │ ├── credentials.yaml # Zalando IAM │ ├── ingress.yaml │ └── service.yaml └── delivery.yaml # Zalando CI/CD
  21. 21. 26 INGRESS.YAML kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "myapp.foo.example.org" http: paths: - backend: serviceName: "myapp" servicePort: 80
  22. 22. 27 TEMPLATING: MUSTACHE kind: Ingress metadata: name: "..." spec: rules: # DNS name your application should be exposed on - host: "{{{APPLICATION}}}.example.org" http: paths: - backend: serviceName: "{{{APPLICATION}}}" servicePort: 80
  23. 23. 28 CONTINUOUS DELIVERY PLATFORM
  24. 24. 29 CDP: DEPLOY "glorified kubectl apply"
  25. 25. 30 CDP: OPTIONAL APPROVAL
  26. 26. 31 STACKSET: TRAFFIC SWITCHING github.com/zalando-incubator/stackset-controller
  27. 27. 32 STACKSET CRD apiVersion: zalando.org/v1 kind: StackSet ... spec: ingress: hosts: ["foo.example.org"] backendPort: 8080 stackLifecycle: scaledownTTLSeconds: 1800 limit: 5 stackTemplate: spec: podTemplate: ... github.com/zalando-incubator/stackset-controller
  28. 28. 33 TRAFFIC SWITCHING STEPS IN CDP github.com/zalando-incubator/stackset-controller
  29. 29. 34 EMERGENCY ACCESS SERVICE Get emergency access by referencing existing Incident ticket: zkubectl cluster-access request --emergency -i INC REASON Get privileged production access via 4-eyes: zkubectl cluster-access request REASON zkubectl cluster-access approve USERNAME
  30. 30. 35 INTEGRATIONS
  31. 31. 36 CLOUD FORMATION VIA CI/CD ├── deploy/apply │ ├── deployment.yaml # Kubernetes │ ├── cf-iam-role.yaml # AWS IAM Role │ ├── cf-rds.yaml # AWS RDS Database │ ├── kube-ingress.yaml │ ├── kube-secret.yaml │ └── kube-service.yaml └── delivery.yaml # CI/CD config "Infrastructure as Code"
  32. 32. 37 ZALANDO IAM/OAUTH VIA CRD apiVersion: zalando.org/v1 kind: PlatformCredentialsSet .. spec: application: my-app tokens: read-only: privileges: - com.zalando::foobar.read clients: employee: grant: authorization-code realm: users redirectUri: https://example.org/auth/callback
  33. 33. 38 POSTGRES OPERATOR Application to manage PostgreSQL clusters on Kubernetes >700 clusters running on Kubernetes github.com/zalando/postgres-operator
  34. 34. Elasticsearch in Kubernetes Elasticsearch 2.500 vCPUs 1 TB RAM github.com/zalando-incubator/es-operator/
  35. 35. 40 SUMMARY • Application Bootstrapping • Git as source of truth and UI • 4-eyes principle for master/production • Extensible Kubernetes API as primary interface • OAuth/IAM credentials • PostgreSQL • CloudFormation for proprietary AWS services
  36. 36. 41 DELIVERY PERFORMANCE METRICS • Lead Time • Release Frequency • Time to Restore Service • Change Fail Rate https://srcco.de/posts/accelerate-software-delivery-performance.html
  37. 37. 42 CONTAINERS From "Accelerate: The Science of Lean Software and DevOps"
  38. 38. 43 DELIVERY PERFORMANCE METRICS • Lead Time • Release Frequency • Time to Restore Service • Change Fail Rate ≙ Commit to Prod ≙ Deploys/week/dev ≙ MTRS from incidents ≙ n/a
  39. 39. “.. means establishing empathy with internal consumers (read: developers) and collaborating with them on the design. Platform product managers establish roadmaps and ensure the platform delivers value to the business and enhances the developer experience.” - ThoughtWorks Technology Radar
  40. 40. 46 DEVELOPER SATISFACTION
  41. 41. 47 DOCUMENTATION "Documentation is hard to find" "Documentation is not comprehensive enough" "Remove unnecessary complexity and obstacles." "Get the documentation up to date and prepare use cases" "More and more clear documentation" "More detailed docs, example repos with more complicated deployments."
  42. 42. 48 DOCUMENTATION • Restructure following https://www.divio.com/en/blog/documentation/ • Concepts • How Tos • Tutorials • Reference • Global Search • Weekly Health Check: Support → Documentation
  43. 43. 50 NEWSLETTER "You can now.." • You can now benefit from the most recent Kubernetes 1.12 features, e.g. .. • You can now analyse your Kotlin project with SonarQube and upload your Scala code coverage report to SonarQube
  44. 44. 51 SIGNAL: ISSUE UPVOTES
  45. 45. 52 TESTIMONIALS “So, thank you, Team Automata, for listening to our community, taking our upvotes in consideration when developing new solutions and building every day 'the first CI that doesn't suck'.” - a user, October 2018
  46. 46. 53 MONITORING
  47. 47. 54 ZMON DASHBOARD github.com/zalando/zmon
  48. 48. 55 GRAFANA APPLICATION DASHBOARD
  49. 49. 56 KUBERNETES RESOURCE REPORT github.com/hjacobs/kube-resource-report
  50. 50. 57 RESOURCE REPORT: TEAMS Sorting teams by Slack Costs github.com/hjacobs/kube-resource-report
  51. 51. 58 RESOURCE REPORT: APPLICATIONS "Slack"
  52. 52. 59 RESOURCE REPORT: CLUSTERS github.com/hjacobs/kube-resource-report "Slack"
  53. 53. 60 UNDER THE HOOD
  54. 54. 61 ZALANDO: DECISION 1. Forbid Memory Overcommit • Implement mutating admission webhook • Set requests = limits 2. Disable CPU CFS Quota in all clusters • --cpu-cfs-quota=false
  55. 55. 62 KUBERNETES CLUSTER SETUP Master Config Worker EC2 Instances CloudFormation Stacks github.com/zalando-incubator/kubernetes-on-aws Master
  56. 56. 63 CLUSTER PROVISIONING CLUSTER LIFECYCLE MANAGER (CLM) ADMIN create apply manifests provision resources create CF stack CLUSTER REGISTRY CLM API ... ... ... CloudFormation API github.com/zalando-incubator/cluster-lifecycle-manager github.com/zalando-incubator/kubernetes-on-aws
  57. 57. 64 INGRESS https://github.com/zalando-incubator/kube-ingress-aws-controller
  58. 58. 65 VPA FOR PROMETHEUS apiVersion: poc.autoscaling.k8s.io/v1alpha1 kind: VerticalPodAutoscaler metadata: name: prometheus-vpa namespace: kube-system spec: selector: matchLabels: application: prometheus updatePolicy: updateMode: Auto CPU/memory
  59. 59. 66 VERTICAL POD AUTOSCALER limit/requests adapted by VPA
  60. 60. 67 HORIZONTAL POD AUTOSCALING (CUSTOM METRICS) Queue Length Prometheus Query Ingress Req/s ZMON Check github.com/zalando-incubator/kube-metrics-adapter
  61. 61. 68 DOWNSCALING DURING OFF-HOURS github.com/hjacobs/kube-downscaler Weekend
  62. 62. 69 DOWNSCALING DURING OFF-HOURS DEFAULT_UPTIME="Mon-Fri 07:30-20:30 CET" annotations: downscaler/exclude: "true" github.com/hjacobs/kube-downscaler
  63. 63. 70 KUBERNETES JANITOR ● TTL and expiry date annotations, e.g. ○ set time-to-live for your test deployment ● Custom rules, e.g. ○ delete everything without "app" label after 7 days github.com/hjacobs/kube-janitor
  64. 64. 71 JANITOR TTL ANNOTATION # let's try out nginx, but only for 1 hour kubectl run nginx --image=nginx kubectl annotate deploy nginx janitor/ttl=1h github.com/hjacobs/kube-janitor
  65. 65. 72 CUSTOM JANITOR RULES # require "app" label for new pods starting April 2019 - id: require-app-label-april-2019 resources: - deployments - statefulsets jmespath: "!(spec.template.metadata.labels.app) && metadata.creationTimestamp > '2019-04-01'" ttl: 7d github.com/hjacobs/kube-janitor
  66. 66. 73 EC2 SPOT NODES 72% savings
  67. 67. 74 SPOT ASG / LAUNCH TEMPLATE Not upstream in cluster-autoscaler (yet)
  68. 68. 75 OPEN SOURCE Kubernetes on AWS github.com/zalando-incubator/kubernetes-on-aws AWS ALB Ingress controller github.com/zalando-incubator/kube-ingress-aws-controller External DNS github.com/kubernetes-incubator/external-dns Postgres Operator github.com/zalando/postgres-operator Kubernetes Resource Report github.com/hjacobs/kube-resource-report Kubernetes Downscaler github.com/hjacobs/kube-downscaler Kubernetes Janitor github.com/hjacobs/kube-janitor
  69. 69. 76 MORE INFO ● DevOps Gathering 2019: Ensuring Kubernetes Cost Efficiency across (many) Clusters (slides) ● DevOpsCon Munich 2018: Running Kubernetes in Production: A Million Ways to Crash Your Cluster ● HighLoad++ Moscow 2018: Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency (slides) ● DevOps Lisbon Meetup 2018: Kubernetes at Zalando kubernetes-on-aws.readthedocs.io/en/latest/admin-guide/public-presentations.html
  70. 70. QUESTIONS? HENNING JACOBS HEAD OF DEVELOPER PRODUCTIVITY henning@zalando.de @try_except_ Illustrations by @01k

×