SlideShare ist ein Scribd-Unternehmen logo
1 von 35
SmartStack, Docker and Yocalhost
How Yelp Does Service Discovery
[Demo]
● This works from (almost) any host in Yelp
● This works from Python, Java, command line etc.
● If a service supports HTTP or TCP then it can be made discoverable.
○ This includes third-party services such as MySQL and scribe
● It’s dynamic: for a given service, if new instances are added then they
will automatically become available.
Very Important Things to Note
● SmartStack (nerve and synapse) were written by Airbnb
● We’ve added some features
● The work here has been carried out by many people across Yelp
Credits
Registration
Architecture
hacheck
service_1
service_2
service_3
Service host
ZK
configure_nerve.py
nerve
Nerve registers service instance in ZooKeeper:
/nerve/region:myregion
├── service_1
│ └── server_1_0000013614
├── service_2
│ └── server_1_0000000959
├── service_3
│ ├── server_1_0000002468
│ └── server_2_0000002467
[...]
ZooKeeper data
The data in a znode is all that is required to connect to the corresponding
service instance.
We’ll shortly see how this is used for discovery.
{
"host":"10.0.0.123",
"port":31337,
"name":"server_1",
"weight":10,
}
ZooKeeper data
hacheck
Normally hacheck just acts as a transparent proxy for our healthchecks:
$ curl -s yocalhost:6666/http/service_1/1234/status | jq .
{
"uptime": 5693819.315988064,
"pid": 2595160,
"host": "server_1",
"version": "b6309e09d71da8f1e28213d251f7c3515878caca",
}
hacheck
We can also use it to fail healthchecks before we shut down a service.
This allows us to gracefully shutdown a service.
(Also provides a 1s cache to limit healthcheck rate.)
$ hadown service_1
$ curl -v yocalhost:6666/http/service_1/1234/status
Service service_1 in down state since 1443217910: billings
configure_nerve.py
How do we know what services to advertise? Every service host
periodically runs a script to regenerate the nerve configuration, reading
from the following sources:
● yelpsoa-configs
runs_on:
server_1
server_2
● puppet
nerve_simple::puppet_service {'foo'}
● mesos slave API
Discovery
Architecture
ZK
client
synapse
haproxy
configure_synapse.py
nerve
HAProxy
● By default bind to 0.0.0.0
● Bind only to yocalhost on public servers.
● HAProxy gives us a lot of goodies for all clients:
○ Redispatch on connection failures
○ Zero-downtime restarts (once you know how :)
○ Easy to insert connection logging
● Each host also exposes an HAProxy status page for easy introspection
configure_synapse.py
Every client host periodically runs a script to regenerate the synapse
configuration, reading service definitions from yelpsoa-configs.
For each service reads a smartstack.yaml file.
Restarts synapse if configuration has changed.
smartstack.yaml
main:
proxy_port: 20973
mode: http
healthcheck_uri: /status
timeout_server_ms: 1000
Namespaces
main:
proxy_port: 20001
mode: http
healthcheck_uri: /status
timeout_server_ms: 1000
long_timeout:
proxy_port: 20002
mode: http
healthcheck_uri: /status
timeout_server_ms: 3000
Same service,
different ports
Escape hatch
Some client libraries like to do their own load balancing e.g. cassandra,
memcached. Use synapse to dump the registration information to disk:
$ cat /var/run/synapse/services/devops.demo.json | jq .
[
{
"host":"10.0.0.123",
"port":31337,
"name":"server_1",
"weight":10,
}
]
Docker + Yocalhost
Architecture
haproxy
docker container 1
lo 127.0.0.1
docker container 2
lo 127.0.0.1
eth0 169.254.14.17
eth0 169.254.14.18
docker0 169.254.1.1
eth0 10.0.1.2
lo:0 169.254.255.254
lo 127.0.0.1
yocalhost
● We’d like to run only one nerve / synapse / haproxy per host
● What address should we bind haproxy to?
● 127.0.0.1 won’t work from within a container
● Instead we pick a link-local address 169.254.255.254 (yocalhost)
● This also works on servers without docker
Locality-aware discovery
Overview
We run services in both our own datacenters as well as AWS.
We logically group these environments according to latency.
Service authors get to decide how ‘widely’ their service instances are
advertised.
Everything is controlled via smartstack.yaml files.
Latency hierarchies
habitat
region
superregion
ZooKeepers live
here
Datacenters
or AZs in AWS
Habitats within
1ms round-trip
e.g. ‘us-west-1’
Regions within 5ms
round-trip e.g. ‘pacific
north-west’
main:
proxy_port: 20973
advertise: [habitat]
discover: habitat
advertise / discover
Synapse should look in the
habitat directory in its local
ZooKeeper
Nerve should register this
service in the habitat directory
of its local ZooKeeper
ZooKeeper data, revisited
/nerve
├── region:us-west-1
│ └── service_1
│ └── server_1_0000013614
├── region:us-west-2
│ └── service_2
│ └── server_2_0000000959
[...]
Extra advertisements
“Wouldn’t it be useful if we could make a service running in datacenter A
available in an (arbitrary) datacenter B?”
Why?
● Makes it easier to bring up a new datacenter
● Makes it easier to add more capacity to a datacenter in an emergency
● Makes it easier to keep a datacenter going in an emergency if a service
fails
main:
advertise: [region]
discover: region
extra_advertise:
region:us-west-1: [region:us-west-2]
extra_advertise
Design choices
Unix 4eva
● Lots of little components, each doing doing one thing well
● Very simple interface for clients and services
○ If it speaks TCP or HTTP we can register it
● Easy to independently replace components
○ HAProxy -> NGINX?
● Easy to observe behavior of components
It’s OK if ZooKeeper fails
● Nerve and Synapse keep retrying
● HAProxy keeps running but with no updates
● HAProxy performs its own healthchecks against service instances
○ If a service instance becomes unavailable then it will stop receiving
traffic after a short period
● The website stays up :)
Does it blend scale?
● Used to have scaling issues with internal load balancers, this is not a
problem with SmartStack :)
● Hit some scaling issues at 10s of thousands of ZooKeeper connections
○ Addressed this by using just a single ZooKeeper connection from
each nerve and synapse
● Used to have lots of HAProxy healthchecks hitting services
○ hacheck insulates services from this
○ We limit HAProxy restart rate
What about etcd / consul / …?
● We try to use boring components :)
● We’re already using Zookeeper for Kafka and ElasticSearch so it’s
natural to use it for our service discovery system too.
● etcd would probably also work, and is supported by SmartStack
● Conceptually similar to consul / consul-template
What about DNS?
● What TTL are you going to use?
● Are you clients even going to honor the TTL?
● Does the DNS resolution happen inline with requests?
Conclusions
● We’ve used SmartStack to create a robust service discovery system
● It’s UNIXy: lots of separate components, each doing one thing well
● It’s flexible: locality-aware discovery
● It’s reliable: new devs at Yelp view discovery as a solved problem
● It’s useful: SmartStack is the glue that holds our SOA together

Weitere ähnliche Inhalte

Was ist angesagt?

Service discovery in Docker environments
Service discovery in Docker environmentsService discovery in Docker environments
Service discovery in Docker environmentsalexandru giurgiu
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with PythonOSCON Byrum
 
A Python Petting Zoo
A Python Petting ZooA Python Petting Zoo
A Python Petting Zoodevondjones
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installationsNETWAYS
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier Hakka Labs
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Codemotion
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulIvan Glushkov
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from DockerJohn Griffith
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet
 
Openstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueOpenstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueVigneshvar A.S
 
Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Sadique Puthen
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Codemotion
 
Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSamantha Quiñones
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Automation with Ansible and Containers
Automation with Ansible and ContainersAutomation with Ansible and Containers
Automation with Ansible and ContainersRodolfo Carvalho
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02Jinho Shin
 
Openstack Overview
Openstack OverviewOpenstack Overview
Openstack Overviewrajdeep
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from DockerTesora
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with PrometheusShiao-An Yuan
 

Was ist angesagt? (20)

Service discovery in Docker environments
Service discovery in Docker environmentsService discovery in Docker environments
Service discovery in Docker environments
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
 
A Python Petting Zoo
A Python Petting ZooA Python Petting Zoo
A Python Petting Zoo
 
Monitoring of OpenNebula installations
Monitoring of OpenNebula installationsMonitoring of OpenNebula installations
Monitoring of OpenNebula installations
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from Docker
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
 
Openstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability IssueOpenstack Scheduler and Scalability Issue
Openstack Scheduler and Scalability Issue
 
Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28Introduction openstack-meetup-nov-28
Introduction openstack-meetup-nov-28
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
Supercharging Content Delivery with Varnish
Supercharging Content Delivery with VarnishSupercharging Content Delivery with Varnish
Supercharging Content Delivery with Varnish
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Automation with Ansible and Containers
Automation with Ansible and ContainersAutomation with Ansible and Containers
Automation with Ansible and Containers
 
London HUG 12/4
London HUG 12/4London HUG 12/4
London HUG 12/4
 
Openstack study-nova-02
Openstack study-nova-02Openstack study-nova-02
Openstack study-nova-02
 
Openstack Overview
Openstack OverviewOpenstack Overview
Openstack Overview
 
Consuming Cinder from Docker
Consuming Cinder from DockerConsuming Cinder from Docker
Consuming Cinder from Docker
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 

Andere mochten auch

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen..."Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...Yelp Engineering
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsYelp Engineering
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...Yelp Engineering
 
Yelp Academic Dataset
Yelp Academic DatasetYelp Academic Dataset
Yelp Academic DatasetMandaniKeyur
 
Building a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpBuilding a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpdotCloud
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsSteven Francia
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 

Andere mochten auch (15)

MySQL At Yelp
MySQL At YelpMySQL At Yelp
MySQL At Yelp
 
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen..."Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presen...
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
 
Giving Design Critique
Giving Design CritiqueGiving Design Critique
Giving Design Critique
 
Yelp Academic Dataset
Yelp Academic DatasetYelp Academic Dataset
Yelp Academic Dataset
 
Humans by the hundred
Humans by the hundredHumans by the hundred
Humans by the hundred
 
Building a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpBuilding a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from Yelp
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Ähnlich wie How Yelp does Service Discovery

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksRuslan Meshenberg
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesMichael Klishin
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to RealitySriram Subramanian
 
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsComparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsImesha Sudasingha
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaSShawn Zhu
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsFederico Michele Facca
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on KubernetesJoerg Henning
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler Peeyush Gupta
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql ClusterAmr Fawzy
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideMohanraj Thirumoorthy
 

Ähnlich wie How Yelp does Service Discovery (20)

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to Reality
 
Comparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systemsComparison between zookeeper, etcd 3 and other distributed coordination systems
Comparison between zookeeper, etcd 3 and other distributed coordination systems
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql Cluster
 
Kubernetes: My BFF
Kubernetes: My BFFKubernetes: My BFF
Kubernetes: My BFF
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Developing Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's GuideDeveloping Microservices using Spring - Beginner's Guide
Developing Microservices using Spring - Beginner's Guide
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

How Yelp does Service Discovery

  • 1. SmartStack, Docker and Yocalhost How Yelp Does Service Discovery
  • 3. ● This works from (almost) any host in Yelp ● This works from Python, Java, command line etc. ● If a service supports HTTP or TCP then it can be made discoverable. ○ This includes third-party services such as MySQL and scribe ● It’s dynamic: for a given service, if new instances are added then they will automatically become available. Very Important Things to Note
  • 4. ● SmartStack (nerve and synapse) were written by Airbnb ● We’ve added some features ● The work here has been carried out by many people across Yelp Credits
  • 7. Nerve registers service instance in ZooKeeper: /nerve/region:myregion ├── service_1 │ └── server_1_0000013614 ├── service_2 │ └── server_1_0000000959 ├── service_3 │ ├── server_1_0000002468 │ └── server_2_0000002467 [...] ZooKeeper data
  • 8. The data in a znode is all that is required to connect to the corresponding service instance. We’ll shortly see how this is used for discovery. { "host":"10.0.0.123", "port":31337, "name":"server_1", "weight":10, } ZooKeeper data
  • 9. hacheck Normally hacheck just acts as a transparent proxy for our healthchecks: $ curl -s yocalhost:6666/http/service_1/1234/status | jq . { "uptime": 5693819.315988064, "pid": 2595160, "host": "server_1", "version": "b6309e09d71da8f1e28213d251f7c3515878caca", }
  • 10. hacheck We can also use it to fail healthchecks before we shut down a service. This allows us to gracefully shutdown a service. (Also provides a 1s cache to limit healthcheck rate.) $ hadown service_1 $ curl -v yocalhost:6666/http/service_1/1234/status Service service_1 in down state since 1443217910: billings
  • 11. configure_nerve.py How do we know what services to advertise? Every service host periodically runs a script to regenerate the nerve configuration, reading from the following sources: ● yelpsoa-configs runs_on: server_1 server_2 ● puppet nerve_simple::puppet_service {'foo'} ● mesos slave API
  • 14. HAProxy ● By default bind to 0.0.0.0 ● Bind only to yocalhost on public servers. ● HAProxy gives us a lot of goodies for all clients: ○ Redispatch on connection failures ○ Zero-downtime restarts (once you know how :) ○ Easy to insert connection logging ● Each host also exposes an HAProxy status page for easy introspection
  • 15. configure_synapse.py Every client host periodically runs a script to regenerate the synapse configuration, reading service definitions from yelpsoa-configs. For each service reads a smartstack.yaml file. Restarts synapse if configuration has changed.
  • 17. Namespaces main: proxy_port: 20001 mode: http healthcheck_uri: /status timeout_server_ms: 1000 long_timeout: proxy_port: 20002 mode: http healthcheck_uri: /status timeout_server_ms: 3000 Same service, different ports
  • 18. Escape hatch Some client libraries like to do their own load balancing e.g. cassandra, memcached. Use synapse to dump the registration information to disk: $ cat /var/run/synapse/services/devops.demo.json | jq . [ { "host":"10.0.0.123", "port":31337, "name":"server_1", "weight":10, } ]
  • 20. Architecture haproxy docker container 1 lo 127.0.0.1 docker container 2 lo 127.0.0.1 eth0 169.254.14.17 eth0 169.254.14.18 docker0 169.254.1.1 eth0 10.0.1.2 lo:0 169.254.255.254 lo 127.0.0.1
  • 21. yocalhost ● We’d like to run only one nerve / synapse / haproxy per host ● What address should we bind haproxy to? ● 127.0.0.1 won’t work from within a container ● Instead we pick a link-local address 169.254.255.254 (yocalhost) ● This also works on servers without docker
  • 23. Overview We run services in both our own datacenters as well as AWS. We logically group these environments according to latency. Service authors get to decide how ‘widely’ their service instances are advertised. Everything is controlled via smartstack.yaml files.
  • 24. Latency hierarchies habitat region superregion ZooKeepers live here Datacenters or AZs in AWS Habitats within 1ms round-trip e.g. ‘us-west-1’ Regions within 5ms round-trip e.g. ‘pacific north-west’
  • 25. main: proxy_port: 20973 advertise: [habitat] discover: habitat advertise / discover Synapse should look in the habitat directory in its local ZooKeeper Nerve should register this service in the habitat directory of its local ZooKeeper
  • 26. ZooKeeper data, revisited /nerve ├── region:us-west-1 │ └── service_1 │ └── server_1_0000013614 ├── region:us-west-2 │ └── service_2 │ └── server_2_0000000959 [...]
  • 27. Extra advertisements “Wouldn’t it be useful if we could make a service running in datacenter A available in an (arbitrary) datacenter B?” Why? ● Makes it easier to bring up a new datacenter ● Makes it easier to add more capacity to a datacenter in an emergency ● Makes it easier to keep a datacenter going in an emergency if a service fails
  • 30. Unix 4eva ● Lots of little components, each doing doing one thing well ● Very simple interface for clients and services ○ If it speaks TCP or HTTP we can register it ● Easy to independently replace components ○ HAProxy -> NGINX? ● Easy to observe behavior of components
  • 31. It’s OK if ZooKeeper fails ● Nerve and Synapse keep retrying ● HAProxy keeps running but with no updates ● HAProxy performs its own healthchecks against service instances ○ If a service instance becomes unavailable then it will stop receiving traffic after a short period ● The website stays up :)
  • 32. Does it blend scale? ● Used to have scaling issues with internal load balancers, this is not a problem with SmartStack :) ● Hit some scaling issues at 10s of thousands of ZooKeeper connections ○ Addressed this by using just a single ZooKeeper connection from each nerve and synapse ● Used to have lots of HAProxy healthchecks hitting services ○ hacheck insulates services from this ○ We limit HAProxy restart rate
  • 33. What about etcd / consul / …? ● We try to use boring components :) ● We’re already using Zookeeper for Kafka and ElasticSearch so it’s natural to use it for our service discovery system too. ● etcd would probably also work, and is supported by SmartStack ● Conceptually similar to consul / consul-template
  • 34. What about DNS? ● What TTL are you going to use? ● Are you clients even going to honor the TTL? ● Does the DNS resolution happen inline with requests?
  • 35. Conclusions ● We’ve used SmartStack to create a robust service discovery system ● It’s UNIXy: lots of separate components, each doing one thing well ● It’s flexible: locality-aware discovery ● It’s reliable: new devs at Yelp view discovery as a solved problem ● It’s useful: SmartStack is the glue that holds our SOA together