SpotFlow: Tracking Method Calls and States at Runtime
Introduction to service discovery and self-organizing cluster orchestration. HashiCorp Serf and Consul
1. Introduction to
service discovery
and
self-organizing cluster orchestration
#pivorak 2016
Andrey Kryachkov & Roman V. Babenko
romanvbabenko@linkedin
romanvbabenko@github
romanvbabenko@twitter
andreykryachkov@linkedin
kryachkov@github
kryachkov@twitter
2. Old School web app infrastructure
Nodes
Hardcoded addresses
Rebuild nodes on env change
No need for service discovery
3. Modern web app infrastructure
Roles, not nodes
No hardcoded addresses
Easy failure recovery
Seamless scaling up and down
5. Serf
Serf is a decentralized solution for cluster membership, failure detection, and
orchestration. Lightweight and highly available.
https://www.serfdom.io/
10. Serf :: USE CASES
Web Servers and Load Balancers
Clustering Memcached or Redis
Triggering Deploys
Updating DNS Records
Simple Observability
A Building Block for Service Discovery
11. Serf :: USE CASES
Web Servers and Load Balancers
Clustering Memcached or Redis
Triggering Deploys
Updating DNS Records
Simple Observability
A Building Block for Service Discovery
12. Serf :: USE CASES
Web Servers and Load Balancers
Clustering Memcached or Redis
Triggering Deploys
Updating DNS Records
Simple Observability
A Building Block for Service Discovery
13. Serf :: USE CASES
Web Servers and Load Balancers
Clustering Memcached or Redis
Triggering Deploys
Updating DNS Records
Simple Observability
A Building Block for Service Discovery
14. Serf :: USE CASES
Web Servers and Load Balancers
Clustering Memcached or Redis
Triggering Deploys
Updating DNS Records
Simple Observability
A Building Block for Service Discovery
15. Serf :: USE CASES
Web Servers and Load Balancers
Clustering Memcached or Redis
Triggering Deploys
Updating DNS Records
Simple Observability
A Building Block for Service Discovery
16. Serf :: USE CASES
Web Servers and Load Balancers
Clustering Memcached or Redis
Triggering Deploys
Updating DNS Records
Simple Observability
A Building Block for Service Discovery
17. demo : production
app
serf
db
serf
server {
# ...
location / {
if (-f $document_root/maintenance.txt)
{
return 503;
}
}
error_page 503 @maintenance;
location @maintenance {
rewrite ^(.*)$ /maintenance.html break;
}
}
18. demo : maintenance on
app
serf
db
serf
# serf/member-failed.sh
#!/bin/bash
while read line; do
ROLE=`echo $line | awk '{print $3 }'`
if [ "${ROLE}" == "db" ]; then
touch /usr/share/nginx/html/maintenance.txt
fi
done
19. demo : maintenance off
app
serf
db
serf
# serf/member-join.sh
#!/bin/bash
while read line; do
ROLE=`echo $line | awk '{print $3 }'`
if [ "${ROLE}" == "db" ]; then
rm /usr/share/nginx/html/maintenance.txt
fi
done
28. Serf vs Consul
Serf Consul
● Decentralized
● Simple health checks
● Node-level membership
● Degraded performance over WAN
● More general purpose-tool
● Will operate until the last node is
alive
● Requires Consul servers (1/3/5)
● Rich health checks
● Service-level membership
● Key/value storage
● Built-in multi-DC support
● Can’t operate w/o central servers
quorum
Hello everyone. My name is Andrey and this is Roman. We are going to give you a brief introduction into service discovery and self-organising cluster orchestration, what is service discovery, what tools can be used to make it and show you a demo of small web application cluster orchestrated by Hashicorp’s Serf.
Modern application’s infrastructure consists of multiple hosts. Their configs are often not dynamic and have hard-coded IP-addresses, what may lead to difficulties with scaling and failure recovery. Imagine situation, when one of application servers in the picture fails and being rebuild with another IP. In the worst situation you have to change load balancer’s config, whitelist new app server’s IP in firewall rules of workers. You have some kind of monolithic architecture (very popular word today?). When the configuration changes infrequently, you don’t even need service discovery at all
As application grows, you face the problem of scaling, when nodes are added to and removed from cluster in a reaction to some events or load change. For example you are running e-commerce service, and it is Black Friday, and you need to double your application servers number. Then, you need some kind of automatic failure recovery for the situation on previous slide. And in this changing environment you need to quickly answer questions like: “Who is DB master today?” or “Which IP’s do application nodes have?”. Now you need a way to detect devices in cluster, their health and services they provide and consume. It is called Service Discovery.
We are introducing two tools by Hashicorp, which provide service discovery solutions. Serf and Consul.
Serf. Decentralized solution for cluster membership, failure detection and orchestration. Lightweight and highly available.
Serf solves 3 basic problems of Service discovery: membership, failure detection, events propagation.
Serf cluster membership is based on fast and reliable gossip protocol. Members periodically exchange with messages, randomly selecting recipients from a list of known members. This allows events to be quickly propagated with cluster and allows nodes to react on them mostly instantly. Member join and member leave events can be handled by every node in a custom way.
Serf is able to quickly detect failed members and notify all alive members within the cluster with member failed event. All alive members will be able to react member failed event in a seconds by executing handler scripts. Serf then will attempt to recover failed nodes membership by trying to connect to them periodically.
In addition to three mentioned events (join, leave, failure) Serf allows to broadcast custom events and queries. Difference between events and queries is that events are built using “fire-and-forget” mechanism, while queries provide request-response mechanism. Custom events can be used to trigger deploys, update configuration and other things you don’t expect feedback from. Queries are used to gather information from members, for example load, health status or any other kind of data. Queries are not delivered to failed nodes, while events eventually reach the node.
It sounds very cool, but what can we do with all this stuff?
Using Serf it is easy to create cluster of application servers and load balancers. Load balancers will listen for changes in a list of application servers and update their configurations on-the-fly accordingly
Same as above, when new Redis or Memcached nodes join cluster or some nodes fail or leave it, your proxy or application updates list of available Redis or Memcached servers.
By using custom events, you can trigger deploy process on your servers. For example on deploy event, workers stop execution, application server starts to update code, cache servers begin warming cache and everything you need to perform deployment of your application.
Similar to Web servers and Redis clusters, DNS records can be updated in seconds after any change in your cluster.
By using queries you can build a simple request-response mechanism to gather health data or any kind of information you may need to receive from members of your cluster.
While Serf does not provide full spectre of Service Discovery features, it easily fits in a role of Building block for Service Discovery, providing the information about what nodes are alive, what addresses do they have and what services do they provide.
We made a simple demo to show how application server may react on DB failure. When DB node fails or goes down we want our app to show fancy maintenance mode page to users. We are going to build the solution using Serf and Nginx as a web server on application node. Nginx configuration is pretty default, when there is a maintenance.txt file in a document root directory, we are returning service unavailable HTTP status code and show a maintenance page.
Shutdown of DB node, triggers member-failed event, that gets quickly propagated within this small Serf cluster. Event handler script is simple. It creates maintenance.txt file in a document root causing nginx to show Service Unavailable page.
When DB node brings up, member-join event handler removes that file and Nginx starts to perform in a normal mode. We made a short video to show Serf in action.
While Serf is a low level solution of Service Discovery problem, Hashicorp offers a high level, more complex and feature rich product - Consul.
Problems being solved by consul are: Service Discovery, Health Checking, Multi Datacenter clusters and Data storage.
Consul clients can provide a service, such as api or mysql, and other clients can use Consul to discover providers of a given service. Using either DNS or HTTP, applications can easily find the services they depend upon.
Consul clients can provide any number of health checks, either associated with a given service ("is the webserver returning 200 OK"), or with the local node ("is memory utilization below 90%"). Combining Service Discovery information and Health Checks you can automatically route traffic away from failing members.
Consul supports multiple datacenters out of the box. This means users of Consul do not have to worry about building additional layers of abstraction to grow to multiple regions.
Consul has built-in key/value storage which can be used for any number of purposes such as dynamic configuration, feature flagging, coordinaion, leader election and so on. The simple HTTP API makes it easy to use.
This is a typical cluster using Consul schema. It includes two datacenters, 3 Consul servers and some number of clients in each. Client communicate with each other and consul servers using Gossip protocol. Consul servers provide cluster operation and key value storage.
Let’s compare Serf and Consul. While Serf is fully decentralized and does not require any central servers, Consul requires to run at least one, 3 or five is better to achieve redundancy, servers to provide cluster operation and it’s services. Out of the box Serf provides only dead or alive health check, while Consul has a lot of built in health checks. Serf (gossip protocol actually) is designed to operate within LAN, but it is possible to use it over WAN but with degraded performance. Consul has built-in multi datacenter support. Serf is more flexible, general-purpose tool, a building block for Service Discovery solutions
To sum everything up. Nowdays, configurations can be dynamic, and can he interpeted as a code.
By the way, Hashicorp supports a bunch of great tools, they are: vagrant - tool to create and configure reproducible development environments under virtualbox, vmware, docker and most of common cloud providers. We used vagrant to spin up virtual machines in demo videos. Packer is a tool for creating images for platforms like Amazon AWS, Openstack, VMware from a single configuration file. Terraform is aimed to infrastructure management. Vault is secure storage for sensitive information like access keys, tokens and passwords. Nomad was created to manage and schedule applicaiton on given resources provided for example by Terraform. And the last one is Otto a tool capable of automatically detecting your application type, creating development environment for and managing process of its deployment. So as you can see Good corporation Hashicorp has many tools to make developer’s or devops’s life much easier.