SlideShare a Scribd company logo
1 of 63
Maciej Lasyk, High Availability Explained
Maciej Lasyk
11. Sesja Linuksowa
Wrocław, 2014-04-05
1/14
High Availability Explained
Maciej Lasyk, High Availability Explained
“Anything that can go wrong, will go wrong”
Murphy's law
2/14
Maciej Lasyk, High Availability Explained
“Anything that can go wrong, will go wrong”
Murphy's law
2/14
Maciej Lasyk, High Availability Explained
An electrical explosion and fire Saturday at a Houston data
center operated by The Planet has taken the entire facility offline.
The company claimed power to the facility was interrupted when a
transformer exploded. Official reports that three walls were blown
down causing a fire.
“Anything that can go wrong, will go wrong”
Murphy's law
2/14
Maciej Lasyk, High Availability Explained
Three walls of the electrical equipment room on the first floor
blew several feet from their original position, and the underground
cabling that powers the first floor of H1 was destroyed.
An electrical explosion and fire Saturday at a Houston data
center operated by The Planet has taken the entire facility offline.
The company claimed power to the facility was interrupted when a
transformer exploded. Official reports that three walls were blown
down causing a fire.
“Anything that can go wrong, will go wrong”
Murphy's law
2/14
Maciej Lasyk, High Availability Explained
High Availability is in the eye of the beholder
3/14
Maciej Lasyk, High Availability Explained
High Availability is in the eye of the beholder
CEO: we don't loose sales
3/14
Maciej Lasyk, High Availability Explained
High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
3/14
Maciej Lasyk, High Availability Explained
High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
3/14
Maciej Lasyk, High Availability Explained
High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
Developers: we can be proud – our services are working ;)
3/14
Maciej Lasyk, High Availability Explained
High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
Developers: we can be proud – our services are working ;)
System engineers: we can sleep well (and fsck, we love to!)
3/14
Maciej Lasyk, High Availability Explained
High Availability is in the eye of the beholder
CEO: we don't loose sales
Sales: we can extend our offer basing on HA level
Accounts managers: we don't upset our customers (that often)
Developers: we can be proud – our services are working ;)
System engineers: we can sleep well (and fsck, we love to!)
Technical support: no calls? Back to WoW then.. ;)
3/14
Maciej Lasyk, High Availability Explained
So how many 9's?
4/14
Maciej Lasyk, High Availability Explained
So how many 9's?
4/14
Maciej Lasyk, High Availability Explained
So how many 9's?
Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability
4/14
Maciej Lasyk, High Availability Explained
So how many 9's?
Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability
Yearly: 1 hour of outage means 100% - 0.01142 ~= 99.98858 of availability
4/14
Maciej Lasyk, High Availability Explained
So how many 9's?
Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability
Yearly: 1 hour of outage means 100% - 0.01142 ~= 99.98858 of availability
Availability Downtime (year) Downtime (month)
90% (“one nine”) 36.5 days 72 hours
95% 18.25 days 36 hours
97% 10.96 days 21.6 hours
98% 7.30 days 14.4 hours
99% (“two nines”) 3.65 days 7.2 hours
99.5% 1.83 days 3.6 hours
99.8% 17.52 hours 86.23 minutes
99.9% (“three nines”) 4.38 hours 21.56 minutes
99.99 (“four nines”) 52.56 minutes 4.32 minutes
99.999 (“five nines”) 5.26 minutes 25.9 seconds
4/14
Maciej Lasyk, High Availability Explained
So how many 9's?
https://jazz.net/wiki/bin/view/Deployment/HighAvailability
4/14
Maciej Lasyk, High Availability Explained
HA terminology
RPO: Recovery Point Objective; how much data can we loose?
5/14
Maciej Lasyk, High Availability Explained
HA terminology
RPO: Recovery Point Objective; how much data can we loose?
RTO: Recovery Time Objective; how long does it take to recover?
5/14
Maciej Lasyk, High Availability Explained
HA terminology
RPO: Recovery Point Objective; how much data can we loose?
RTO: Recovery Time Objective; how long does it take to recover?
MTBF: Mean-Times-Between-Failures; time between failures
(density fnc -> reliability fnc)
https://en.wikipedia.org/wiki/Mean_time_between_failures
5/14
Maciej Lasyk, High Availability Explained
HA terminology
SLA: Service Level Agreement;
formal definitions (customer <-> provider)
5/14
Maciej Lasyk, High Availability Explained
HA terminology
SLA: Service Level Agreement;
formal definitions (customer <-> provider)
OLA: Operational Level Agreement; definitions within organization;
help us keeping provided SLAs
5/14
Maciej Lasyk, High Availability Explained
SLAs..
So what is written in SLAs?
Availability Downtime (year) Downtime (month)
90% 36.5 days 72 hours
95% 18.25 days 36 hours
97% 10.96 days 21.6 hours
98% 7.30 days 14.4 hours
99% 3.65 days 7.2 hours
99.5% 1.83 days 3.6 hours
99.8% 17.52 hours 86.23 minutes
99.9% 4.38 hours 21.56 minutes
99.99 52.56 minutes 4.32 minutes
99.999 5.26 minutes 25.9 seconds
5/14
Maciej Lasyk, High Availability Explained
SLAs..
So what is written in SLAs?
Availability Downtime (year) Downtime (month)
90% 36.5 days 72 hours
95% 18.25 days 36 hours
97% 10.96 days 21.6 hours
98% 7.30 days 14.4 hours
99% 3.65 days 7.2 hours
99.5% (EC2, EBS) 1.83 days 3.6 hours
99.8% 17.52 hours 86.23 minutes
99.9% (SoftLayer, IBM) 4.38 hours 21.56 minutes
99.99 (OVH ded. cloud) 52.56 minutes 4.32 minutes
99.999 5.26 minutes 25.9 seconds
http://aws.amazon.com/ec2/sla/
http://www.softlayer.com/about/service-level-agreement
https://www.ovh.com/us/dedicated-cloud/security-and-sla.xml
5/14
Maciej Lasyk, High Availability Explained
SLAs..
Availability mentioned in SLAs are only goals of service provider
Usually when it's not met than company pays off the fees
5/14
Maciej Lasyk, High Availability Explained
SLAs..
5/14
Hetzner?
“We guarantee an annual average of 99% network availability”
“For indirect damages and loss of profits, we are liable only in
cases of intentional or gross negligence. In this case we are
liable only for the contract-typical predictable damage, a
maximum of 100% of the annually fee.”
http://www.hetzner.de/en/hosting/legal/agb
Maciej Lasyk, High Availability Explained
SLAs..
5/14
Leaseweb?
(yup - megaupload)
- no %s
- best effort
- $$$ for faster response
times
http://www.leaseweb.com/en/support/all-about/sla
Maciej Lasyk, High Availability Explained
How deep is this hole?
app layer (core, db, cache)
data storage
operating system
hardware
networking
location
So we would like to achieve 99,9999% which is about 30s of downtime per year
6/14
Maciej Lasyk, High Availability Explained
app layer (core, db, cache)
data storage
operating system
hardware
networking
location
Even Proof of Concept is very hard to provide: 2.5s of downtime per layer monthly!
6/14
How deep is this hole?
Maciej Lasyk, High Availability Explained
app layer (core, db, cache)
data storage
operating system
hardware
networking
location
AOL has 99,999%!
http://highscalability.com/blog/2014/2/17/how-the-aolcom-architecture-evolved-to-99999-availability-8.html
6/14
How deep is this hole?
Maciej Lasyk, High Availability Explained
Load-balancing and failover
LB:
http://www.netdigix.com/linux-loadbalancing.php
7/14
Maciej Lasyk, High Availability Explained
Load-balancing and failover
Failover:
http://www.simplefailover.com/
7/14
Maciej Lasyk, High Availability Explained
LB – 4th
layer or 7th
?
4th
layer:
- high performance
- just do the LB work!
- reliable
- scalable
7th
layer:
- low cost
- good for quickfixes / patches
- not that scalable
- low performance
- complex codebase
- custom code for protocols
- cookies? what about memcache..
8/14
Maciej Lasyk, High Availability Explained
Disaster Recovery
9/14
Maciej Lasyk, High Availability Explained
Disaster Recovery
http://disasterrecovery.starwindsoftware.com/planning-disaster-recovery-for-virtualized-environments
9/14
Maciej Lasyk, High Availability Explained
Disaster Recovery
http://disasterrecovery.starwindsoftware.com/planning-disaster-recovery-for-virtualized-environments
Hot site: active synchronization, could be serving services. Cost can be high
Warm site: periodical synchronization, DR tests needed. Low costs
Cold site: Nothing here – just echo and some place to spin services; nightmare
9/14
Maciej Lasyk, High Availability Explained
Disaster Recovery
http://disasterrecovery.starwindsoftware.com/planning-disaster-recovery-for-virtualized-environments
Hot site: active synchronization, could be serving services. Cost can be high
So maybe use “hot” as production and global LB? win-win scenario
9/14
Maciej Lasyk, High Availability Explained
Planning for failure
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
Everything starts here - DNS:
- keep TTLs low (300s). Can't make under 60min? That's bad!
- check SLA of DNS servers (dnsmadeeasy.com history)
- what do you know about DNSes?
- zero downtime here is a must!
- this can be achieved with complicated network abracadabra
- remember what 99.9999% means?
- round robin is a load – balancer but without failover!
- GSLB – killed by OS/browser/srvs cache'ing
(GlobalServerLoadBalancing)
- GlobalIP (SoftLayer etc) – workaround for GSLB via routing
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
E-mail servers:
- it's simple as MX records (delivering)
- it's almost simple as complicated system of SMTP servers (sending)
- it's not that simple when IMAP locking over DFS (reading)
5 gmail-smtp-in.l.google.com.
10 alt1.gmail-smtp-in.l.google.com.
20 alt2.gmail-smtp-in.l.google.com.
30 alt3.gmail-smtp-in.l.google.com.
40 alt4.gmail-smtp-in.l.google.com.
When MXing – watch the spam!
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
WEB servers:
- it's simple as some frontend loadbalancer
- did you really stick user session to particular server? Memcache!
- LB balancing algorithm
- how many LBs?
- what if LB goes down?
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
DB servers:
- it's.. not that simple
- replication (master – master? App should be aware..)
- replication ring? Complicated, works, but in case of failure...
- let's talk about MySQL:
- NoSPOF solution: MySQL cluster
- MySQL Galera cluster – synch, active-active multi-master
- master – master – simply works
- MySQL fabric – HA + sharding; use with large farms
- Failover? Matsunobu Yoshinori mysql-master-ha
- MySQL utilities (http://www.clusterdb.com/mysql/mysql-utilities-webinar-qa-replay-now-available/)
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
DB servers:
Matsunobu Yoshinori mysql-master-ha
https://code.google.com/p/mysql-master-ha/
“I have heard a couple of cases where AWS users use MHA
instead of RDS because RDS takes much longer downtime
on slave promotion (more than 5 minutes usually, because
standby database is not running).
I'm surprised AWS users care about a few minutes of downtime..”
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
Caching servers:
- this is cache for God's sake – why would we use HA here?
- just use proper architecture like... redundancy.
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
Caching servers:
- this is cache for God's sake – why would we use HA here?
- just use proper architecture like... redundancy.
Load – balancers:
- remember about failovering IP addresses!
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
Caching servers:
- this is cache for God's sake – why would we use HA here?
- just use proper architecture like... redundancy.
Load – balancers:
- remember about failovering IP addresses!
Storage – DFSes:
- GlusterFS – we'll see it in action in a minute
- NFS? Could be – over some SAN / NAS (high cost solution)
- CephFS – just like GlusterFS – it's great and does the work
- DRBD – lower level, does the work on block – device layer – slow...
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
GlusterFS:
- low cost (could be..)
- distributed volumes
- replicated volumes
- striped volumes
- and...
- distributed – striped volumes
- distributed – replicated volumes
- distributed – striped – replicated volumes
- sound good? :)
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
GlusterFS: replicated volumes vs Geo-replication
- replicated:
- mirrors data
- provides HA
- synch – replication
- Geo-replication:
- mirrors data across geo – distributed clusters
- ensures backing up data for DR
- asynch – replica (periodic checks)
10/14
Maciej Lasyk, High Availability Explained
Planning for failure
HA for virtualization solutions?
- it's really complicated, like...
11/14
Maciej Lasyk, High Availability Explained
Planning for failure
HA for virtualization solutions?
- it's really complicated, like...
11/14
Maciej Lasyk, High Availability Explained
Planning for failure
HA for virtualization solutions?
- but it could be done simpler...
11/14
Maciej Lasyk, High Availability Explained
Planning for failure
HA for virtualization solutions?
- it could be done simpler with containers!
- containers are very light
- deploy time from bare is tiny
- management is very easy
- resources throttling via cgroups
11/14
Maciej Lasyk, High Availability Explained
Tools
The most important tool would be the conclusion from the picture below:
12/14
Maciej Lasyk, High Availability Explained 12/14
Tools
The most important tool would be the conclusion from the picture below:
Maciej Lasyk, High Availability Explained 12/14
Tools
The most important tool would be the conclusion from the picture below:
Maciej Lasyk, High Availability Explained
Tools
- DNS: roundrobin, GSLB, low ttls, globalIP
12/14
Maciej Lasyk, High Availability Explained
Tools
- DNS: roundrobin, GSLB, low ttls, globalIP
- Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx
12/14
Maciej Lasyk, High Availability Explained
Tools
- DNS: roundrobin, GSLB, low ttls, globalIP
- Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx
- Failover (statefull services):
- IP: KeepAlived + sysctl
12/14
Maciej Lasyk, High Availability Explained
Tools
- DNS: roundrobin, GSLB, low ttls, globalIP
- Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx
- Failover (statefull services):
- IP: KeepAlived + sysctl
- Managing: pacemaker (manager) + corosync (message'ing)
12/14
Maciej Lasyk, High Availability Explained
Tools
- DNS: roundrobin, GSLB, low ttls, globalIP
- Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx
- Failover (statefull services):
- IP: KeepAlived + sysctl
- Managing: pacemaker (manager) + corosync (message'ing)
- (almost) All-In-One: Linux Virtual Server
12/14
Maciej Lasyk, High Availability Explained
Turn on HA thinking!
Main goal of HA? Improve user experience!
- keep the app fully functional
- keep the app resistant and tolerant to faults
- provide method for a successful audit
- sleep well (anyone awake?) ;)
13/14
Maciej Lasyk, High Availability Explained
Maciej Lasyk
11. Sesja Linuksowa
2014-04-05, Wrocław
http://maciek.lasyk.info/sysop
maciek@lasyk.info
@docent-net
High Availability Explained
Thank you :)
14/14

More Related Content

Viewers also liked

AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
AWS Webcast - On-Demand Video Streaming using Amazon CloudFront  AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
AWS Webcast - On-Demand Video Streaming using Amazon CloudFront Amazon Web Services
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling PinterestC4Media
 
Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost OptimizationAdrian Cockcroft
 
From Push Technology to Real-Time Messaging and WebSockets
From Push Technology to Real-Time Messaging and WebSocketsFrom Push Technology to Real-Time Messaging and WebSockets
From Push Technology to Real-Time Messaging and WebSocketsAlessandro Alinone
 
Business continuity management (case study)
Business continuity management (case study)Business continuity management (case study)
Business continuity management (case study)Wissam Abdel Baki
 
Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...
Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...
Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...HGST Storage
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) ExplainedMaciej Lasyk
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
EU-US Privacy Shield - Safe Harbor Replacement
EU-US Privacy Shield - Safe Harbor ReplacementEU-US Privacy Shield - Safe Harbor Replacement
EU-US Privacy Shield - Safe Harbor ReplacementGACC_Midwest
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High AvailabilityAmazon Web Services
 
Webinar Herramientas de Marketing para aumentar tus ingresos en 2016
Webinar Herramientas de Marketing para aumentar tus ingresos en 2016Webinar Herramientas de Marketing para aumentar tus ingresos en 2016
Webinar Herramientas de Marketing para aumentar tus ingresos en 2016FLOWww Gestión y Marketing
 
Disaster Management and Major Disaster in INDIA
Disaster Management and Major Disaster in INDIADisaster Management and Major Disaster in INDIA
Disaster Management and Major Disaster in INDIAranjan lokegaonkar
 

Viewers also liked (14)

AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
AWS Webcast - On-Demand Video Streaming using Amazon CloudFront  AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
AWS Webcast - On-Demand Video Streaming using Amazon CloudFront
 
Scaling Pinterest
Scaling PinterestScaling Pinterest
Scaling Pinterest
 
Cloud Native Cost Optimization
Cloud Native Cost OptimizationCloud Native Cost Optimization
Cloud Native Cost Optimization
 
From Push Technology to Real-Time Messaging and WebSockets
From Push Technology to Real-Time Messaging and WebSocketsFrom Push Technology to Real-Time Messaging and WebSockets
From Push Technology to Real-Time Messaging and WebSockets
 
Bezvadu zemene Ogres CB
Bezvadu zemene Ogres CBBezvadu zemene Ogres CB
Bezvadu zemene Ogres CB
 
Business continuity management (case study)
Business continuity management (case study)Business continuity management (case study)
Business continuity management (case study)
 
Storyboard colocation strategy
Storyboard colocation strategyStoryboard colocation strategy
Storyboard colocation strategy
 
Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...
Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...
Spanning Hot and Cold Data: Enabling Key Storage Technologies for the Data C...
 
High Availability (HA) Explained
High Availability (HA) ExplainedHigh Availability (HA) Explained
High Availability (HA) Explained
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
EU-US Privacy Shield - Safe Harbor Replacement
EU-US Privacy Shield - Safe Harbor ReplacementEU-US Privacy Shield - Safe Harbor Replacement
EU-US Privacy Shield - Safe Harbor Replacement
 
Architecting for High Availability
Architecting for High AvailabilityArchitecting for High Availability
Architecting for High Availability
 
Webinar Herramientas de Marketing para aumentar tus ingresos en 2016
Webinar Herramientas de Marketing para aumentar tus ingresos en 2016Webinar Herramientas de Marketing para aumentar tus ingresos en 2016
Webinar Herramientas de Marketing para aumentar tus ingresos en 2016
 
Disaster Management and Major Disaster in INDIA
Disaster Management and Major Disaster in INDIADisaster Management and Major Disaster in INDIA
Disaster Management and Major Disaster in INDIA
 

Similar to High Availability (HA) Explained - second edition

MySQL InnoDB Cluster - Group Replication
MySQL InnoDB Cluster - Group ReplicationMySQL InnoDB Cluster - Group Replication
MySQL InnoDB Cluster - Group ReplicationFrederic Descamps
 
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...confluent
 
State of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to comeState of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to comeKonrad Malawski
 
How Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemHow Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemKonrad Malawski
 
Supercharge Your Applications
Supercharge Your ApplicationsSupercharge Your Applications
Supercharge Your ApplicationsSean Boiling
 
#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolensky
#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolensky#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolensky
#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolenskyvdmchallenge
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceDoKC
 
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...AWS Germany
 
Performance culture through the looking-glass - performance.now() 2022
Performance culture through the looking-glass - performance.now() 2022Performance culture through the looking-glass - performance.now() 2022
Performance culture through the looking-glass - performance.now() 2022Dora Militaru
 
Giles Sirett - CloudStack news
Giles Sirett - CloudStack newsGiles Sirett - CloudStack news
Giles Sirett - CloudStack newsShapeBlue
 
VMworld 2011 Review: Preparing for vSphere 5 with Virtualization Manager
VMworld 2011 Review: Preparing for vSphere 5 with Virtualization ManagerVMworld 2011 Review: Preparing for vSphere 5 with Virtualization Manager
VMworld 2011 Review: Preparing for vSphere 5 with Virtualization ManagerSolarWinds
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIn-Memory Computing Summit
 
Microservices 5 things i wish i'd known java with the best 2018
Microservices 5 things i wish i'd known   java with the best 2018Microservices 5 things i wish i'd known   java with the best 2018
Microservices 5 things i wish i'd known java with the best 2018Vincent Kok
 
Microservices 5 Things I Wish I'd Known - JFall 2017
Microservices 5 Things I Wish I'd Known - JFall 2017Microservices 5 Things I Wish I'd Known - JFall 2017
Microservices 5 Things I Wish I'd Known - JFall 2017Vincent Kok
 
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...Codemotion
 
What's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar chartsWhat's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar chartsAndrew Morgan
 
JavaOne 2016 "Java, Microservices, Cloud and Containers"
JavaOne 2016 "Java, Microservices, Cloud and Containers"JavaOne 2016 "Java, Microservices, Cloud and Containers"
JavaOne 2016 "Java, Microservices, Cloud and Containers"Daniel Bryant
 
NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...
NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...
NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...Amazon Web Services
 
Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...
Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...
Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...Cloud Native Day Tel Aviv
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveAWS Chicago
 

Similar to High Availability (HA) Explained - second edition (20)

MySQL InnoDB Cluster - Group Replication
MySQL InnoDB Cluster - Group ReplicationMySQL InnoDB Cluster - Group Replication
MySQL InnoDB Cluster - Group Replication
 
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
 
State of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to comeState of Akka 2017 - The best is yet to come
State of Akka 2017 - The best is yet to come
 
How Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM EcosystemHow Reactive Streams & Akka Streams change the JVM Ecosystem
How Reactive Streams & Akka Streams change the JVM Ecosystem
 
Supercharge Your Applications
Supercharge Your ApplicationsSupercharge Your Applications
Supercharge Your Applications
 
#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolensky
#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolensky#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolensky
#VirtualDesignMaster 3 Challenge 2 - Lubomir Zvolensky
 
Using Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” serviceUsing Kubernetes to deliver a “serverless” service
Using Kubernetes to deliver a “serverless” service
 
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...Switching SaaS Hosting From dedicated virtual machines to container-based clu...
Switching SaaS Hosting From dedicated virtual machines to container-based clu...
 
Performance culture through the looking-glass - performance.now() 2022
Performance culture through the looking-glass - performance.now() 2022Performance culture through the looking-glass - performance.now() 2022
Performance culture through the looking-glass - performance.now() 2022
 
Giles Sirett - CloudStack news
Giles Sirett - CloudStack newsGiles Sirett - CloudStack news
Giles Sirett - CloudStack news
 
VMworld 2011 Review: Preparing for vSphere 5 with Virtualization Manager
VMworld 2011 Review: Preparing for vSphere 5 with Virtualization ManagerVMworld 2011 Review: Preparing for vSphere 5 with Virtualization Manager
VMworld 2011 Review: Preparing for vSphere 5 with Virtualization Manager
 
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of StatelessnessIMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
IMC Summit 2016 Breakout - Aleksandar Seovic - The Illusion of Statelessness
 
Microservices 5 things i wish i'd known java with the best 2018
Microservices 5 things i wish i'd known   java with the best 2018Microservices 5 things i wish i'd known   java with the best 2018
Microservices 5 things i wish i'd known java with the best 2018
 
Microservices 5 Things I Wish I'd Known - JFall 2017
Microservices 5 Things I Wish I'd Known - JFall 2017Microservices 5 Things I Wish I'd Known - JFall 2017
Microservices 5 Things I Wish I'd Known - JFall 2017
 
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...
Fabio Tiriticco - Ádám Sándor - Akka Cluster versus Kubernetes: Clustering...
 
What's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar chartsWhat's new in MySQL Cluster 7.4 webinar charts
What's new in MySQL Cluster 7.4 webinar charts
 
JavaOne 2016 "Java, Microservices, Cloud and Containers"
JavaOne 2016 "Java, Microservices, Cloud and Containers"JavaOne 2016 "Java, Microservices, Cloud and Containers"
JavaOne 2016 "Java, Microservices, Cloud and Containers"
 
NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...
NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...
NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) ...
 
Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...
Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...
Kafka Mirror Tester: Go and Kubernetes Powered Test Suite for Kafka Replicati...
 
Chicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at CohesiveChicago AWS user group meetup - May 2014 at Cohesive
Chicago AWS user group meetup - May 2014 at Cohesive
 

More from Maciej Lasyk

Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemProgramowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemMaciej Lasyk
 
Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f Maciej Lasyk
 
"Containers do not contain"
"Containers do not contain""Containers do not contain"
"Containers do not contain"Maciej Lasyk
 
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & DevopsMaciej Lasyk
 
Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)Maciej Lasyk
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOpsMaciej Lasyk
 
About cultural change w/Devops
About cultural change w/DevopsAbout cultural change w/Devops
About cultural change w/DevopsMaciej Lasyk
 
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Maciej Lasyk
 
Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)Maciej Lasyk
 
Orchestrating Docker containers at scale
Orchestrating Docker containers at scaleOrchestrating Docker containers at scale
Orchestrating Docker containers at scaleMaciej Lasyk
 
Ghost in the shell
Ghost in the shellGhost in the shell
Ghost in the shellMaciej Lasyk
 
Scaling and securing node.js apps
Scaling and securing node.js appsScaling and securing node.js apps
Scaling and securing node.js appsMaciej Lasyk
 
Monitoring with Nagios and Ganglia
Monitoring with Nagios and GangliaMonitoring with Nagios and Ganglia
Monitoring with Nagios and GangliaMaciej Lasyk
 
Stop disabling SELinux!
Stop disabling SELinux!Stop disabling SELinux!
Stop disabling SELinux!Maciej Lasyk
 
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)Maciej Lasyk
 
Shall we play a game? PL version
Shall we play a game? PL versionShall we play a game? PL version
Shall we play a game? PL versionMaciej Lasyk
 

More from Maciej Lasyk (20)

Rundeck & Ansible
Rundeck & AnsibleRundeck & Ansible
Rundeck & Ansible
 
Docker 1.11
Docker 1.11Docker 1.11
Docker 1.11
 
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudemProgramowanie AWSa z CLI, boto, Ansiblem i libcloudem
Programowanie AWSa z CLI, boto, Ansiblem i libcloudem
 
Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f Co powinieneś wiedzieć na temat devops?f
Co powinieneś wiedzieć na temat devops?f
 
"Containers do not contain"
"Containers do not contain""Containers do not contain"
"Containers do not contain"
 
Git Submodules
Git SubmodulesGit Submodules
Git Submodules
 
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & Devops
 
Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)Under the Dome (of failure driven pipeline)
Under the Dome (of failure driven pipeline)
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
 
About cultural change w/Devops
About cultural change w/DevopsAbout cultural change w/Devops
About cultural change w/Devops
 
Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)Orchestrating docker containers at scale (#DockerKRK edition)
Orchestrating docker containers at scale (#DockerKRK edition)
 
Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)Orchestrating docker containers at scale (PJUG edition)
Orchestrating docker containers at scale (PJUG edition)
 
Orchestrating Docker containers at scale
Orchestrating Docker containers at scaleOrchestrating Docker containers at scale
Orchestrating Docker containers at scale
 
Ghost in the shell
Ghost in the shellGhost in the shell
Ghost in the shell
 
Scaling and securing node.js apps
Scaling and securing node.js appsScaling and securing node.js apps
Scaling and securing node.js apps
 
Node.js security
Node.js securityNode.js security
Node.js security
 
Monitoring with Nagios and Ganglia
Monitoring with Nagios and GangliaMonitoring with Nagios and Ganglia
Monitoring with Nagios and Ganglia
 
Stop disabling SELinux!
Stop disabling SELinux!Stop disabling SELinux!
Stop disabling SELinux!
 
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)
 
Shall we play a game? PL version
Shall we play a game? PL versionShall we play a game? PL version
Shall we play a game? PL version
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

High Availability (HA) Explained - second edition

  • 1. Maciej Lasyk, High Availability Explained Maciej Lasyk 11. Sesja Linuksowa Wrocław, 2014-04-05 1/14 High Availability Explained
  • 2. Maciej Lasyk, High Availability Explained “Anything that can go wrong, will go wrong” Murphy's law 2/14
  • 3. Maciej Lasyk, High Availability Explained “Anything that can go wrong, will go wrong” Murphy's law 2/14
  • 4. Maciej Lasyk, High Availability Explained An electrical explosion and fire Saturday at a Houston data center operated by The Planet has taken the entire facility offline. The company claimed power to the facility was interrupted when a transformer exploded. Official reports that three walls were blown down causing a fire. “Anything that can go wrong, will go wrong” Murphy's law 2/14
  • 5. Maciej Lasyk, High Availability Explained Three walls of the electrical equipment room on the first floor blew several feet from their original position, and the underground cabling that powers the first floor of H1 was destroyed. An electrical explosion and fire Saturday at a Houston data center operated by The Planet has taken the entire facility offline. The company claimed power to the facility was interrupted when a transformer exploded. Official reports that three walls were blown down causing a fire. “Anything that can go wrong, will go wrong” Murphy's law 2/14
  • 6. Maciej Lasyk, High Availability Explained High Availability is in the eye of the beholder 3/14
  • 7. Maciej Lasyk, High Availability Explained High Availability is in the eye of the beholder CEO: we don't loose sales 3/14
  • 8. Maciej Lasyk, High Availability Explained High Availability is in the eye of the beholder CEO: we don't loose sales Sales: we can extend our offer basing on HA level 3/14
  • 9. Maciej Lasyk, High Availability Explained High Availability is in the eye of the beholder CEO: we don't loose sales Sales: we can extend our offer basing on HA level Accounts managers: we don't upset our customers (that often) 3/14
  • 10. Maciej Lasyk, High Availability Explained High Availability is in the eye of the beholder CEO: we don't loose sales Sales: we can extend our offer basing on HA level Accounts managers: we don't upset our customers (that often) Developers: we can be proud – our services are working ;) 3/14
  • 11. Maciej Lasyk, High Availability Explained High Availability is in the eye of the beholder CEO: we don't loose sales Sales: we can extend our offer basing on HA level Accounts managers: we don't upset our customers (that often) Developers: we can be proud – our services are working ;) System engineers: we can sleep well (and fsck, we love to!) 3/14
  • 12. Maciej Lasyk, High Availability Explained High Availability is in the eye of the beholder CEO: we don't loose sales Sales: we can extend our offer basing on HA level Accounts managers: we don't upset our customers (that often) Developers: we can be proud – our services are working ;) System engineers: we can sleep well (and fsck, we love to!) Technical support: no calls? Back to WoW then.. ;) 3/14
  • 13. Maciej Lasyk, High Availability Explained So how many 9's? 4/14
  • 14. Maciej Lasyk, High Availability Explained So how many 9's? 4/14
  • 15. Maciej Lasyk, High Availability Explained So how many 9's? Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability 4/14
  • 16. Maciej Lasyk, High Availability Explained So how many 9's? Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability Yearly: 1 hour of outage means 100% - 0.01142 ~= 99.98858 of availability 4/14
  • 17. Maciej Lasyk, High Availability Explained So how many 9's? Monthly: 1 hour of outage means 100% - 0.13888 ~= 99.86112 of availability Yearly: 1 hour of outage means 100% - 0.01142 ~= 99.98858 of availability Availability Downtime (year) Downtime (month) 90% (“one nine”) 36.5 days 72 hours 95% 18.25 days 36 hours 97% 10.96 days 21.6 hours 98% 7.30 days 14.4 hours 99% (“two nines”) 3.65 days 7.2 hours 99.5% 1.83 days 3.6 hours 99.8% 17.52 hours 86.23 minutes 99.9% (“three nines”) 4.38 hours 21.56 minutes 99.99 (“four nines”) 52.56 minutes 4.32 minutes 99.999 (“five nines”) 5.26 minutes 25.9 seconds 4/14
  • 18. Maciej Lasyk, High Availability Explained So how many 9's? https://jazz.net/wiki/bin/view/Deployment/HighAvailability 4/14
  • 19. Maciej Lasyk, High Availability Explained HA terminology RPO: Recovery Point Objective; how much data can we loose? 5/14
  • 20. Maciej Lasyk, High Availability Explained HA terminology RPO: Recovery Point Objective; how much data can we loose? RTO: Recovery Time Objective; how long does it take to recover? 5/14
  • 21. Maciej Lasyk, High Availability Explained HA terminology RPO: Recovery Point Objective; how much data can we loose? RTO: Recovery Time Objective; how long does it take to recover? MTBF: Mean-Times-Between-Failures; time between failures (density fnc -> reliability fnc) https://en.wikipedia.org/wiki/Mean_time_between_failures 5/14
  • 22. Maciej Lasyk, High Availability Explained HA terminology SLA: Service Level Agreement; formal definitions (customer <-> provider) 5/14
  • 23. Maciej Lasyk, High Availability Explained HA terminology SLA: Service Level Agreement; formal definitions (customer <-> provider) OLA: Operational Level Agreement; definitions within organization; help us keeping provided SLAs 5/14
  • 24. Maciej Lasyk, High Availability Explained SLAs.. So what is written in SLAs? Availability Downtime (year) Downtime (month) 90% 36.5 days 72 hours 95% 18.25 days 36 hours 97% 10.96 days 21.6 hours 98% 7.30 days 14.4 hours 99% 3.65 days 7.2 hours 99.5% 1.83 days 3.6 hours 99.8% 17.52 hours 86.23 minutes 99.9% 4.38 hours 21.56 minutes 99.99 52.56 minutes 4.32 minutes 99.999 5.26 minutes 25.9 seconds 5/14
  • 25. Maciej Lasyk, High Availability Explained SLAs.. So what is written in SLAs? Availability Downtime (year) Downtime (month) 90% 36.5 days 72 hours 95% 18.25 days 36 hours 97% 10.96 days 21.6 hours 98% 7.30 days 14.4 hours 99% 3.65 days 7.2 hours 99.5% (EC2, EBS) 1.83 days 3.6 hours 99.8% 17.52 hours 86.23 minutes 99.9% (SoftLayer, IBM) 4.38 hours 21.56 minutes 99.99 (OVH ded. cloud) 52.56 minutes 4.32 minutes 99.999 5.26 minutes 25.9 seconds http://aws.amazon.com/ec2/sla/ http://www.softlayer.com/about/service-level-agreement https://www.ovh.com/us/dedicated-cloud/security-and-sla.xml 5/14
  • 26. Maciej Lasyk, High Availability Explained SLAs.. Availability mentioned in SLAs are only goals of service provider Usually when it's not met than company pays off the fees 5/14
  • 27. Maciej Lasyk, High Availability Explained SLAs.. 5/14 Hetzner? “We guarantee an annual average of 99% network availability” “For indirect damages and loss of profits, we are liable only in cases of intentional or gross negligence. In this case we are liable only for the contract-typical predictable damage, a maximum of 100% of the annually fee.” http://www.hetzner.de/en/hosting/legal/agb
  • 28. Maciej Lasyk, High Availability Explained SLAs.. 5/14 Leaseweb? (yup - megaupload) - no %s - best effort - $$$ for faster response times http://www.leaseweb.com/en/support/all-about/sla
  • 29. Maciej Lasyk, High Availability Explained How deep is this hole? app layer (core, db, cache) data storage operating system hardware networking location So we would like to achieve 99,9999% which is about 30s of downtime per year 6/14
  • 30. Maciej Lasyk, High Availability Explained app layer (core, db, cache) data storage operating system hardware networking location Even Proof of Concept is very hard to provide: 2.5s of downtime per layer monthly! 6/14 How deep is this hole?
  • 31. Maciej Lasyk, High Availability Explained app layer (core, db, cache) data storage operating system hardware networking location AOL has 99,999%! http://highscalability.com/blog/2014/2/17/how-the-aolcom-architecture-evolved-to-99999-availability-8.html 6/14 How deep is this hole?
  • 32. Maciej Lasyk, High Availability Explained Load-balancing and failover LB: http://www.netdigix.com/linux-loadbalancing.php 7/14
  • 33. Maciej Lasyk, High Availability Explained Load-balancing and failover Failover: http://www.simplefailover.com/ 7/14
  • 34. Maciej Lasyk, High Availability Explained LB – 4th layer or 7th ? 4th layer: - high performance - just do the LB work! - reliable - scalable 7th layer: - low cost - good for quickfixes / patches - not that scalable - low performance - complex codebase - custom code for protocols - cookies? what about memcache.. 8/14
  • 35. Maciej Lasyk, High Availability Explained Disaster Recovery 9/14
  • 36. Maciej Lasyk, High Availability Explained Disaster Recovery http://disasterrecovery.starwindsoftware.com/planning-disaster-recovery-for-virtualized-environments 9/14
  • 37. Maciej Lasyk, High Availability Explained Disaster Recovery http://disasterrecovery.starwindsoftware.com/planning-disaster-recovery-for-virtualized-environments Hot site: active synchronization, could be serving services. Cost can be high Warm site: periodical synchronization, DR tests needed. Low costs Cold site: Nothing here – just echo and some place to spin services; nightmare 9/14
  • 38. Maciej Lasyk, High Availability Explained Disaster Recovery http://disasterrecovery.starwindsoftware.com/planning-disaster-recovery-for-virtualized-environments Hot site: active synchronization, could be serving services. Cost can be high So maybe use “hot” as production and global LB? win-win scenario 9/14
  • 39. Maciej Lasyk, High Availability Explained Planning for failure 10/14
  • 40. Maciej Lasyk, High Availability Explained Planning for failure Everything starts here - DNS: - keep TTLs low (300s). Can't make under 60min? That's bad! - check SLA of DNS servers (dnsmadeeasy.com history) - what do you know about DNSes? - zero downtime here is a must! - this can be achieved with complicated network abracadabra - remember what 99.9999% means? - round robin is a load – balancer but without failover! - GSLB – killed by OS/browser/srvs cache'ing (GlobalServerLoadBalancing) - GlobalIP (SoftLayer etc) – workaround for GSLB via routing 10/14
  • 41. Maciej Lasyk, High Availability Explained Planning for failure E-mail servers: - it's simple as MX records (delivering) - it's almost simple as complicated system of SMTP servers (sending) - it's not that simple when IMAP locking over DFS (reading) 5 gmail-smtp-in.l.google.com. 10 alt1.gmail-smtp-in.l.google.com. 20 alt2.gmail-smtp-in.l.google.com. 30 alt3.gmail-smtp-in.l.google.com. 40 alt4.gmail-smtp-in.l.google.com. When MXing – watch the spam! 10/14
  • 42. Maciej Lasyk, High Availability Explained Planning for failure WEB servers: - it's simple as some frontend loadbalancer - did you really stick user session to particular server? Memcache! - LB balancing algorithm - how many LBs? - what if LB goes down? 10/14
  • 43. Maciej Lasyk, High Availability Explained Planning for failure DB servers: - it's.. not that simple - replication (master – master? App should be aware..) - replication ring? Complicated, works, but in case of failure... - let's talk about MySQL: - NoSPOF solution: MySQL cluster - MySQL Galera cluster – synch, active-active multi-master - master – master – simply works - MySQL fabric – HA + sharding; use with large farms - Failover? Matsunobu Yoshinori mysql-master-ha - MySQL utilities (http://www.clusterdb.com/mysql/mysql-utilities-webinar-qa-replay-now-available/) 10/14
  • 44. Maciej Lasyk, High Availability Explained Planning for failure DB servers: Matsunobu Yoshinori mysql-master-ha https://code.google.com/p/mysql-master-ha/ “I have heard a couple of cases where AWS users use MHA instead of RDS because RDS takes much longer downtime on slave promotion (more than 5 minutes usually, because standby database is not running). I'm surprised AWS users care about a few minutes of downtime..” 10/14
  • 45. Maciej Lasyk, High Availability Explained Planning for failure Caching servers: - this is cache for God's sake – why would we use HA here? - just use proper architecture like... redundancy. 10/14
  • 46. Maciej Lasyk, High Availability Explained Planning for failure Caching servers: - this is cache for God's sake – why would we use HA here? - just use proper architecture like... redundancy. Load – balancers: - remember about failovering IP addresses! 10/14
  • 47. Maciej Lasyk, High Availability Explained Planning for failure Caching servers: - this is cache for God's sake – why would we use HA here? - just use proper architecture like... redundancy. Load – balancers: - remember about failovering IP addresses! Storage – DFSes: - GlusterFS – we'll see it in action in a minute - NFS? Could be – over some SAN / NAS (high cost solution) - CephFS – just like GlusterFS – it's great and does the work - DRBD – lower level, does the work on block – device layer – slow... 10/14
  • 48. Maciej Lasyk, High Availability Explained Planning for failure GlusterFS: - low cost (could be..) - distributed volumes - replicated volumes - striped volumes - and... - distributed – striped volumes - distributed – replicated volumes - distributed – striped – replicated volumes - sound good? :) 10/14
  • 49. Maciej Lasyk, High Availability Explained Planning for failure GlusterFS: replicated volumes vs Geo-replication - replicated: - mirrors data - provides HA - synch – replication - Geo-replication: - mirrors data across geo – distributed clusters - ensures backing up data for DR - asynch – replica (periodic checks) 10/14
  • 50. Maciej Lasyk, High Availability Explained Planning for failure HA for virtualization solutions? - it's really complicated, like... 11/14
  • 51. Maciej Lasyk, High Availability Explained Planning for failure HA for virtualization solutions? - it's really complicated, like... 11/14
  • 52. Maciej Lasyk, High Availability Explained Planning for failure HA for virtualization solutions? - but it could be done simpler... 11/14
  • 53. Maciej Lasyk, High Availability Explained Planning for failure HA for virtualization solutions? - it could be done simpler with containers! - containers are very light - deploy time from bare is tiny - management is very easy - resources throttling via cgroups 11/14
  • 54. Maciej Lasyk, High Availability Explained Tools The most important tool would be the conclusion from the picture below: 12/14
  • 55. Maciej Lasyk, High Availability Explained 12/14 Tools The most important tool would be the conclusion from the picture below:
  • 56. Maciej Lasyk, High Availability Explained 12/14 Tools The most important tool would be the conclusion from the picture below:
  • 57. Maciej Lasyk, High Availability Explained Tools - DNS: roundrobin, GSLB, low ttls, globalIP 12/14
  • 58. Maciej Lasyk, High Availability Explained Tools - DNS: roundrobin, GSLB, low ttls, globalIP - Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx 12/14
  • 59. Maciej Lasyk, High Availability Explained Tools - DNS: roundrobin, GSLB, low ttls, globalIP - Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx - Failover (statefull services): - IP: KeepAlived + sysctl 12/14
  • 60. Maciej Lasyk, High Availability Explained Tools - DNS: roundrobin, GSLB, low ttls, globalIP - Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx - Failover (statefull services): - IP: KeepAlived + sysctl - Managing: pacemaker (manager) + corosync (message'ing) 12/14
  • 61. Maciej Lasyk, High Availability Explained Tools - DNS: roundrobin, GSLB, low ttls, globalIP - Load-Balancers (l7, stateless services)): HaProxy, Pound, Nginx - Failover (statefull services): - IP: KeepAlived + sysctl - Managing: pacemaker (manager) + corosync (message'ing) - (almost) All-In-One: Linux Virtual Server 12/14
  • 62. Maciej Lasyk, High Availability Explained Turn on HA thinking! Main goal of HA? Improve user experience! - keep the app fully functional - keep the app resistant and tolerant to faults - provide method for a successful audit - sleep well (anyone awake?) ;) 13/14
  • 63. Maciej Lasyk, High Availability Explained Maciej Lasyk 11. Sesja Linuksowa 2014-04-05, Wrocław http://maciek.lasyk.info/sysop maciek@lasyk.info @docent-net High Availability Explained Thank you :) 14/14