SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Introducing Envoy-based
service mesh at
Booking.com
Ivan Kruglov
KubeCon Europe 2018
02.05.2018
based in Amsterdam
28M listings
1,5M room nights per day
1700 IT staff
Agenda
• what is service mesh?
• why did we start the project?
• our setup
• learnings
• conclusion
What is
service mesh*?
* the way I understand it
application provider
service consumer service provider
application provider
service consumer service provider
communication infra
application provider
service consumer service provider
proxy proxy
control plane
Why did we start
service mesh
project?
monolith
service
service
service service service
service
service
service
service
service
service
Consistency & Visibility
in communications
Our setup
application
provider
proxy
service consumer service provider
data plane
HTTP(S)
HTTP
• graceful restart
• TCP proxy
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
control plane
• in-house (v1/v2 API)
• started in August 2017; Istio is around 0.2
• one complex system at a time
• start with bare-metal support
• minimal abstraction
• yup, just write Envoy config (partially)
• fine with Envoy’s set of features
• self-service for service owners
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
Kubernetes
(service discovery)
ZooKeeper
(configuration)
configuration
• Envoy configuration is quite rich
• power vs. usability:
• cluster specification
• virtual host specification
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
envoy cluster spec envoy virtual host spec
kind: VirtualHostSpec
spec:
name: api.vhost
domains: ["api.service"]
routes:
- match:
prefix: /
route:
cluster: api.prod.cluster
timeout: 10s
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 3s
kind: ClusterSpec
metadata:
annotations:
watcher_name: zookeeper.watcher
paths: ["/pools/api/dc"]
spec:
name: api.prod.cluster
lb_policy: LEAST_REQUEST
connect_timeout: 1s
lb_subset_config:
subset_selectors:
- keys: ["DC"]
- keys: ["IsLocal"]
fallback_policy: DEFAULT_SUBSET
default_subset:
IsLocal: true
envoy cluster spec envoy virtual host spec
bootstrap wizard
control plane
ZooKeeper
VC VC VC
service A
expose to service D
service C
expose to service Dservice B
envoy
I’m service D
here is service A
and service C specs
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
Kubernetes
(service discovery)
ZooKeeper
(configuration)
configuration
changes
Deploying changes
• integration with existent infrastructure
• git-deploy
• vanguard (canary) deployment
• syntax and semantic validation
• fast rollback
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
ZooKeeper
(configuration)
configuration
changes
metrics
Kubernetes
(service discovery)
Monitoring
• approach #1 – graphite
• excellent in-house facilities
• aggregations is too heavy at scale
• approach #2 – prometheus
• statsd support
• statsd_exporter
• still iterating
• standard dashboard
envoy
statsd_exporter
envoy…
application
provider
envoy
service consumer service provider
data plane
HTTP(S)
HTTP
control plane
control plane
• routing rules
• timeout/retry/etc policies
• service discovery
ZooKeeper
(service discovery)
ZooKeeper
(configuration)
configuration
changes
metrics
Kubernetes
(service discovery)
Production
Some numbers
• ~ 6 months in production
• ~ 30 projects
• ~ 10K servers
• hundreds of thousands RPS
• overheads…
p50
p75 p90
p95
p99
HTTP
perceived latency
0ms
5ms
10ms
15ms
+ 1ms
HTTPS
perceived latency
p50
p75 p90
p95
p99
10ms
+ 1ms
15ms
20ms
25ms
30ms
40ms
Some findings
• graceful restart works, but not always!
• (major) Envoy upgrades may not be graceful
• ”x-envoy-retry-on = connect-failure” – too few
“x-envoy-retry-on = 5xx” – too much
• gateway-error – HTTP 502, 503, 504
• absence of TCP keepalive => stale connections
• coming in 1.7
• idle_timeout in TCP proxy in 1.6
• cluster_name (1.5) -> cluster_names (1.6)
Conclusion
service mesh* is
a building block on the path to SOA
technically & organizationally
* the way I understand it
Andrei Vereha
Envoy production stories
Friday, May 4th 14:00
Envoy Deep Dive
Thank you!
Ivan Kruglov
ivan.kruglov@booking.com

Weitere ähnliche Inhalte

Was ist angesagt?

Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...StreamNative
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil Naik
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutSaju Madhavan
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In DeepMydbops
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022HostedbyConfluent
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsBrendan Gregg
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino ProjectMartin Traverso
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaYoungHeon (Roy) Kim
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication confluent
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifiAnshuman Ghosh
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 

Was ist angesagt? (20)

Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and FanoutOpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Java Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame GraphsJava Performance Analysis on Linux with Flame Graphs
Java Performance Analysis on Linux with Flame Graphs
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 

Ähnlich wie Introducing envoy-based service mesh at Booking.com

Managing Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesManaging Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesIftach Schonbaum
 
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...Daniel Bryant
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileAmazon Web Services Japan
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Amazon Web Services
 
F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017Guy Brown
 
Load Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLoad Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLee Calcote
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDocker, Inc.
 
Exploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservicesExploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservices💡 Tomasz Kogut
 
Working with PowerVC via its REST APIs
Working with PowerVC via its REST APIsWorking with PowerVC via its REST APIs
Working with PowerVC via its REST APIsJoe Cropper
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewMaría Angélica Bracho
 
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorialHuabing Zhao
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSAmazon Web Services
 
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & ProvidersDEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & ProvidersCisco DevNet
 
Service Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesService Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesSreenivas Makam
 
DCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep diveDCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep diveMadhu Venugopal
 
Web services - A Practical Approach
Web services - A Practical ApproachWeb services - A Practical Approach
Web services - A Practical ApproachMadhaiyan Muthu
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...QAware GmbH
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Pavel Chunyayev
 

Ähnlich wie Introducing envoy-based service mesh at Booking.com (20)

Managing Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on KubernetesManaging Microservices With The Istio Service Mesh on Kubernetes
Managing Microservices With The Istio Service Mesh on Kubernetes
 
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
Canadian CNCF: "Emissary-ingress 101: An introduction to the CNCF incubation-...
 
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & MobileIVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
IVS CTO Night And Day 2018 Winter - [re:Cap] Serverless & Mobile
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
 
F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017
 
Load Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & KubernetesLoad Balancing in the Cloud using Nginx & Kubernetes
Load Balancing in the Cloud using Nginx & Kubernetes
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container Networking
 
Exploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservicesExploring Twitter's Finagle technology stack for microservices
Exploring Twitter's Finagle technology stack for microservices
 
Working with PowerVC via its REST APIs
Working with PowerVC via its REST APIsWorking with PowerVC via its REST APIs
Working with PowerVC via its REST APIs
 
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewOpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview
 
REST APIs
REST APIsREST APIs
REST APIs
 
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorial
 
Monitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECSMonitoring in Motion: Monitoring Containers and Amazon ECS
Monitoring in Motion: Monitoring Containers and Amazon ECS
 
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & ProvidersDEVNET-1128	Cisco Intercloud Fabric NB Api's for Business & Providers
DEVNET-1128 Cisco Intercloud Fabric NB Api's for Business & Providers
 
Service Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and KubernetesService Discovery using etcd, Consul and Kubernetes
Service Discovery using etcd, Consul and Kubernetes
 
DCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep diveDCUS17 : Docker networking deep dive
DCUS17 : Docker networking deep dive
 
Web services - A Practical Approach
Web services - A Practical ApproachWeb services - A Practical Approach
Web services - A Practical Approach
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 

Mehr von Ivan Kruglov

SRE: Site Reliability Engineering
SRE: Site Reliability EngineeringSRE: Site Reliability Engineering
SRE: Site Reliability EngineeringIvan Kruglov
 
Blue-green & canary deployments
Blue-green & canary deploymentsBlue-green & canary deployments
Blue-green & canary deploymentsIvan Kruglov
 
Обратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектурыОбратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектурыIvan Kruglov
 
Kubernetes в Booking.com
Kubernetes в Booking.comKubernetes в Booking.com
Kubernetes в Booking.comIvan Kruglov
 
Тернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисовТернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисовIvan Kruglov
 
Service mesh для микросервисов
Service mesh для микросервисовService mesh для микросервисов
Service mesh для микросервисовIvan Kruglov
 
SOA: Строим свой service mesh
SOA: Строим свой service meshSOA: Строим свой service mesh
SOA: Строим свой service meshIvan Kruglov
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comIvan Kruglov
 
Sereal: a view from inside
Sereal: a view from insideSereal: a view from inside
Sereal: a view from insideIvan Kruglov
 
SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!Ivan Kruglov
 
Мониторинг, когда не тестируешь
Мониторинг, когда не тестируешьМониторинг, когда не тестируешь
Мониторинг, когда не тестируешьIvan Kruglov
 
Архитектура поиска в Booking.com
Архитектура поиска в Booking.comАрхитектура поиска в Booking.com
Архитектура поиска в Booking.comIvan Kruglov
 
Processing JSON messages in highspeed
Processing JSON messages in highspeedProcessing JSON messages in highspeed
Processing JSON messages in highspeedIvan Kruglov
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesIvan Kruglov
 
Sereal and its tooling
Sereal and its toolingSereal and its tooling
Sereal and its toolingIvan Kruglov
 

Mehr von Ivan Kruglov (16)

SRE: Site Reliability Engineering
SRE: Site Reliability EngineeringSRE: Site Reliability Engineering
SRE: Site Reliability Engineering
 
Blue-green & canary deployments
Blue-green & canary deploymentsBlue-green & canary deployments
Blue-green & canary deployments
 
Обратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектурыОбратная сторона сервис-ориентированной архитектуры
Обратная сторона сервис-ориентированной архитектуры
 
Kubernetes в Booking.com
Kubernetes в Booking.comKubernetes в Booking.com
Kubernetes в Booking.com
 
Тернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисовТернии контейнеризованных приложений и микросервисов
Тернии контейнеризованных приложений и микросервисов
 
Service mesh для микросервисов
Service mesh для микросервисовService mesh для микросервисов
Service mesh для микросервисов
 
SOA: Строим свой service mesh
SOA: Строим свой service meshSOA: Строим свой service mesh
SOA: Строим свой service mesh
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.com
 
Sereal: a view from inside
Sereal: a view from insideSereal: a view from inside
Sereal: a view from inside
 
SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!SOA: послать запрос на сервер? Что может быть проще?!
SOA: послать запрос на сервер? Что может быть проще?!
 
Мониторинг, когда не тестируешь
Мониторинг, когда не тестируешьМониторинг, когда не тестируешь
Мониторинг, когда не тестируешь
 
Архитектура поиска в Booking.com
Архитектура поиска в Booking.comАрхитектура поиска в Booking.com
Архитектура поиска в Booking.com
 
Processing JSON messages in highspeed
Processing JSON messages in highspeedProcessing JSON messages in highspeed
Processing JSON messages in highspeed
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searches
 
Optimize sereal
Optimize serealOptimize sereal
Optimize sereal
 
Sereal and its tooling
Sereal and its toolingSereal and its tooling
Sereal and its tooling
 

Kürzlich hochgeladen

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Introducing envoy-based service mesh at Booking.com

  • 1. Introducing Envoy-based service mesh at Booking.com Ivan Kruglov KubeCon Europe 2018 02.05.2018
  • 2. based in Amsterdam 28M listings 1,5M room nights per day 1700 IT staff
  • 3. Agenda • what is service mesh? • why did we start the project? • our setup • learnings • conclusion
  • 4. What is service mesh*? * the way I understand it
  • 6. application provider service consumer service provider communication infra
  • 7. application provider service consumer service provider proxy proxy control plane
  • 8. Why did we start service mesh project?
  • 11.
  • 12.
  • 13. Consistency & Visibility in communications
  • 15. application provider proxy service consumer service provider data plane HTTP(S) HTTP
  • 17. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery
  • 18. control plane • in-house (v1/v2 API) • started in August 2017; Istio is around 0.2 • one complex system at a time • start with bare-metal support • minimal abstraction • yup, just write Envoy config (partially) • fine with Envoy’s set of features • self-service for service owners
  • 19. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) Kubernetes (service discovery) ZooKeeper (configuration)
  • 20. configuration • Envoy configuration is quite rich • power vs. usability: • cluster specification • virtual host specification
  • 21. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 22. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 23. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 24. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true
  • 25. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true envoy cluster spec envoy virtual host spec
  • 26. kind: VirtualHostSpec spec: name: api.vhost domains: ["api.service"] routes: - match: prefix: / route: cluster: api.prod.cluster timeout: 10s retry_policy: retry_on: 5xx num_retries: 3 per_try_timeout: 3s kind: ClusterSpec metadata: annotations: watcher_name: zookeeper.watcher paths: ["/pools/api/dc"] spec: name: api.prod.cluster lb_policy: LEAST_REQUEST connect_timeout: 1s lb_subset_config: subset_selectors: - keys: ["DC"] - keys: ["IsLocal"] fallback_policy: DEFAULT_SUBSET default_subset: IsLocal: true envoy cluster spec envoy virtual host spec bootstrap wizard
  • 27. control plane ZooKeeper VC VC VC service A expose to service D service C expose to service Dservice B envoy I’m service D here is service A and service C specs
  • 28. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) Kubernetes (service discovery) ZooKeeper (configuration) configuration changes
  • 29. Deploying changes • integration with existent infrastructure • git-deploy • vanguard (canary) deployment • syntax and semantic validation • fast rollback
  • 30. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) ZooKeeper (configuration) configuration changes metrics Kubernetes (service discovery)
  • 31. Monitoring • approach #1 – graphite • excellent in-house facilities • aggregations is too heavy at scale • approach #2 – prometheus • statsd support • statsd_exporter • still iterating • standard dashboard envoy statsd_exporter envoy…
  • 32.
  • 33. application provider envoy service consumer service provider data plane HTTP(S) HTTP control plane control plane • routing rules • timeout/retry/etc policies • service discovery ZooKeeper (service discovery) ZooKeeper (configuration) configuration changes metrics Kubernetes (service discovery)
  • 35. Some numbers • ~ 6 months in production • ~ 30 projects • ~ 10K servers • hundreds of thousands RPS • overheads…
  • 38. Some findings • graceful restart works, but not always! • (major) Envoy upgrades may not be graceful • ”x-envoy-retry-on = connect-failure” – too few “x-envoy-retry-on = 5xx” – too much • gateway-error – HTTP 502, 503, 504 • absence of TCP keepalive => stale connections • coming in 1.7 • idle_timeout in TCP proxy in 1.6 • cluster_name (1.5) -> cluster_names (1.6)
  • 40. service mesh* is a building block on the path to SOA technically & organizationally * the way I understand it
  • 41. Andrei Vereha Envoy production stories Friday, May 4th 14:00 Envoy Deep Dive