SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Availability in a Cloud-native World.
Guidelines for mere mortals.
Academy of Technology - PREVAIL 2019 – München 🇩🇪
—
Haytham Elkhoja
Chief Architect & Global Tech Leader
IBM Services - Continuous Availability (a.k.a Always On)
haytham.elkhoja@ibm.com
Relevant links and assets:
https://ibm.biz/alwaysonbook
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
/WHOIS
2
@hek
/in/haytham.Elkhoja
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
March 2017 “Amazon broke the
internet with a typo” cnn.com
Impacted apps:
- Netflix
- HootSuite
- Expedia
- Slack
- Business Insider
- Reddit
3
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
June 2019 “Google details
'catastrophic' cloud outage
events: Promises to do better
next time” zdnet.com
Impacted apps:
- Snapchat
- Spotify
- Google Docs
- Youtube
- Pokemon Go
- Gmail
4
What the
hell is
happening…
5Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 6
On why outages happen.
App and DB
67%
Batch
11%
Hardware
14%
Environmental
8%
Planned Outages
Process
40%
Application
40%
Hardware
10%
OS
10%
Unplanned Outages
IBM’s
Always On
Patterns
7Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 8
Keeping your app available during
planned and unplanned outages or
failures requires geographically-
distributed, multi-active, multi-
regions deployments.
Users
Data Replication
Session Replication
Traffic Traffic
Traffic
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 9
The IBM Always On Pattern starts
at the infrastructure layer,
progresses to the data,
influences application design and
extends to the people and the
culture.
Herbie Pearthree, Distinguished Engineer
hpear3@us.ibm.com
Everything
breaks!
10Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
State &
Consistency
Chaos &
Validation
Zones, Regions
& Swimlanes
Portability &
Deployment
Thinking differently
about Availability in a
Cloud-native world.
11Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Portability
&
Deployment
12Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Code differently.
Cloud-native Apps should be self-
contained, polyglot, loosely-
coupled, cattle-scaled, immutable,
idempotent, ephemeral and protocol
aware.
13
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
No two clouds are created equal.
Architect for cloud mobility. Your
app should be cloud, infrastructure
and OS agnostic. The 12 factor
patterns will help you get there.
14
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
No strings attached.
Environment variables should be
bootstrapped, also a requirement
for environment parity and your own
sanity.
15
FROM alpine:3.1
COPY app /app
COPY docker-entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
docker build -t app:v2 .
docker run --rm 
-e "APP_DATADIR=/var/lib/data" 
-e "APP_HOST=host.com" 
-e "APP_PORT=3306" 
-e "APP_USERNAME=user" 
-e "APP_PASSWORD=password" 
-e "APP_DATABASE=test" 
app:v2
2019/10/15 04:44:29 Starting application...
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Delegate responsibilities.
Whatever as a Service. Somebody,
somewhere has done a much better
job.
16
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Trim down the fat.
Dependency management with multi-
stage builds is an art one must
pursue to keep apps clean and lean.
17
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Got Syslog?
Feed information and timestamp
using STDOUT and STDERR. Clarify
who’s the source.
18
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
git’s your bible.
Everything should be versioned,
ephemeral and reproducible using
GitOps methods. This includes
configuration files and
Infrastructure as Code.
19
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Design for failure.
Handle SIGTERM and SIGKILL like a
champ.
20
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
#$@&%*!
Fail gracefully and inform your
customers what’s up (or down), pun
intended.
21
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Robots > humans.
Actions performed by humans
hundreds of times won’t be
performed the same way each
time, even with the best
intentions. Automate.
22
Zones,
Regions
&
Swimlanes 23Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Resilient clouds don’t mean
resilient apps.
Multi active regions help you
scale while being resilient.
Out of Region is more than
just an insurance policy.
24
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Stay in your swimlane.
Respect region affinity and
stickiness using geo load
balancers to resolve traffic
to the nearest region and stay
there.
Crossing regions is a no no.
25
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
DNS is your best friend.
Religiously steer clear from
IP addresses. Service
discovery will point you to
the right path.
And if you can’t, use Anycast.
26
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
The most boring OS configs are
also the most important ones.
A /etc/resolv.conf ‘search’
entry forces traffic to your
swimlane’s subdomain, helping
you with region affinity.
27
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Share-nothing. Cluster-
nothing. Stretch-nothing.
Control-planes are delicate
creatures, especially if
stretched or shared.
28
DB DB
Disk
DB DB DB
Disk
DB
Disk
DB DB DB
DiskDisk Disk
Share
Everything
Share Disks
and Networking
Share Nothing
NetworkingNetworking Networking
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Bypass failures all together.
Disaster recovery processes
lead to a mediocre and
sometimes catastrophic
experience.
29
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Are we there yet?
Discover the awesome world of
service readiness, liveness
probes, circuit-breakers,
retries, rate-limiting,
bulkheading and fallbacks.
30
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
One deployment at a time.
Rolling updates strategies for zero
downtime deployments within a
cluster or availability zone.
31
Deploy by adding an instance, then
remove an old one
Deploy by removing an instance, then
add a new one
Deploy by updating instances as fast as
possible
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
One region at a time.
Then do the same across regions.
Your customers will not even
know what’s happening behind
the scenes.
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Love thy neighbor.
Configure resource requests and
limits. Throttle API requests.
State
&
Consistency
34Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
The network is reliable. Right.
CAP Theorem must be well understood
when choosing data stores. Knowing
that partition tolerance cannot be
sacrificed, pick consistency or
availability.
35
P
A C
Pick
A or C
Oracle, DB2, MySQL etc…
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Do you really need Strong
Consistency?
Applications can support weak,
eventual, or strong consistency.
36
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Distributed consistency is already
difficult as it is.
Normally, higher availability means
higher revenue. Think of ATM
machines. A trumps C.
Educate your business on eventual
consistency. Strong consistency
should be the last option, unless
you’re the NYSE.
37
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Master! Master!
Write anywhere and everywhere.
Master-Master, Master-less and
Peer to Peer database-level
replication.
Shard, partition or
Write/Query if you can’t.
38
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Data Replication. More than
meets the eye.
Data patterns differ. Not all
data is created equal.
39
Messaging
BPM
CEP
APP
Active standby
or active/query
Hot standby
or configured
active/active for
fast switchover
Multi-master
or peer-to-peer
write anywhere
Data distribution
filter and push
Data warehouse
integration and
federation
Data through
messaging filter
and push
distribution
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Conflict resolution during a
network partition will make
you creative.
Log and notify conflicts.
Last-write-wins, CQRS, write
partitioning are all valid but
subjective (and emotional)
decisions.
40
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
NTP is dead. Long live NTP.
Achieve globally distributed,
consensus respected,
synchronously-replicated,
databases with Google TrueTime
and AWS Time Sync, if you
really need it.
41
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Database is much more than
just a DBA’s job.
Database versioning and
backward-compatible schemas
are not optional, but
compulsory.
42
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Why is my shopping cart empty?
Aim for stateless, but
maintain sessions, if you
must.
43
Chaos
&
Validation
44Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Design for feedback.
Measure every single detail
via KPIs and SLIs. Capture
metrics and logs. There’s no
such thing as too much logs.
45
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Hope is not a strategy.
Reduce uncertainty with game days,
then aim to regularly injecting
failure in your production
environment.
46
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Continuous tinkering is healthy.
Use randomness to spoon-feed
yourself with discoveries. You’ll
be surprised what you come across.
47
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
You don’t choose Chaos Monkey.
Chaos Monkey chooses you.
When pursuing Chaos
Engineering, start controlled,
small, observe, squash and
learn.
Remember, there is nothing
Chaotic about Chaos
Engineering.
48
“Chaos Engineering the discipline
of experimenting on a distributed
system in order to build
confidence in the system's
capability to withstand turbulent
conditions in production.”
https://principlesofchaos.org
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Chaos Engineering is a
collection of “What if”s.
What if I add latency? What if
I DDoS a service? What if I
change the hardware clock?
49
Example of tests:
• tc qdisc add dev eth0 root netem delay 300ms
• wrk -t12 -c400 -d30s http://host/api/request
• stress-ng --random 50 -t 60 --metrics-brief --times
• iptables -I OUTPUT -p udp -d DNS Server --dport 53 -j DROP
• umount /mnt/blockstorage
• hwclock
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
The rollback button is a lie.
That’s not only true for
application deployments but also
for fault injection, as both face
the same fundamental problem:
State.
50
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Go beyond trivial ICMP and
connection tests.
Synthetic automated monitoring
help you understand what your
digital users experience far
from typical platform
monitoring.
Do it from multiple locations.
51
Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
Love DevOps? Wait till you
meet SRE.
SRE is what happens when you
ask a software engineer to
design an operations team.
52
Thank you!
53Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
54Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪

Weitere ähnliche Inhalte

Was ist angesagt?

History of Data-Centric Transformation
History of Data-Centric TransformationHistory of Data-Centric Transformation
History of Data-Centric Transformationscoopnewsgroup
 
Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)VMware Tanzu
 
SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC Anton Chuvakin
 
Cloud-Native Microservices
Cloud-Native MicroservicesCloud-Native Microservices
Cloud-Native MicroservicesDiego Pacheco
 
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...Josef Adersberger
 
The Paved Road at Netflix
The Paved Road at NetflixThe Paved Road at Netflix
The Paved Road at NetflixDianne Marsh
 
Software Architecture Conference - Monitoring Microservices - A Challenge
Software Architecture Conference -  Monitoring Microservices - A ChallengeSoftware Architecture Conference -  Monitoring Microservices - A Challenge
Software Architecture Conference - Monitoring Microservices - A ChallengeAdrian Cockcroft
 
Digital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutionsDigital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutionsEric D. Schabell
 
Why cloud native matters
Why cloud native mattersWhy cloud native matters
Why cloud native mattersCheryl Hung
 
Cloud Computing is not simple
Cloud Computing is not simpleCloud Computing is not simple
Cloud Computing is not simpleCloudOps Summit
 
StorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storageStorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storageStorageOS
 
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsNane Kratzke
 
App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...Eric D. Schabell
 
Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24Christian Deger
 
Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?Michael O'Sullivan
 
Red Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformationRed Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformationEric D. Schabell
 
Fast Delivery DevOps Israel
Fast Delivery DevOps IsraelFast Delivery DevOps Israel
Fast Delivery DevOps IsraelAdrian Cockcroft
 
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...Donnie Berkholz
 

Was ist angesagt? (20)

Cloud Native Machine Learning
Cloud Native Machine Learning Cloud Native Machine Learning
Cloud Native Machine Learning
 
History of Data-Centric Transformation
History of Data-Centric TransformationHistory of Data-Centric Transformation
History of Data-Centric Transformation
 
Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)Evolving to Cloud-Native - Nate Schutta (1/2)
Evolving to Cloud-Native - Nate Schutta (1/2)
 
SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC SOCstock 2021 The Cloud-native SOC
SOCstock 2021 The Cloud-native SOC
 
Cloud-Native Microservices
Cloud-Native MicroservicesCloud-Native Microservices
Cloud-Native Microservices
 
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...The good, the bad, and the ugly of migrating hundreds of legacy applications ...
The good, the bad, and the ugly of migrating hundreds of legacy applications ...
 
The Paved Road at Netflix
The Paved Road at NetflixThe Paved Road at Netflix
The Paved Road at Netflix
 
Software Architecture Conference - Monitoring Microservices - A Challenge
Software Architecture Conference -  Monitoring Microservices - A ChallengeSoftware Architecture Conference -  Monitoring Microservices - A Challenge
Software Architecture Conference - Monitoring Microservices - A Challenge
 
Digital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutionsDigital foundations - Paving the road to cloud solutions
Digital foundations - Paving the road to cloud solutions
 
Cloud Native: what is it? Why?
Cloud Native: what is it? Why?Cloud Native: what is it? Why?
Cloud Native: what is it? Why?
 
Why cloud native matters
Why cloud native mattersWhy cloud native matters
Why cloud native matters
 
Cloud Computing is not simple
Cloud Computing is not simpleCloud Computing is not simple
Cloud Computing is not simple
 
StorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storageStorageOS - 8 core principles of cloud native storage
StorageOS - 8 core principles of cloud native storage
 
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
 
App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...App Dev in the Cloud: Not my circus, not my monkeys...
App Dev in the Cloud: Not my circus, not my monkeys...
 
Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24Microservices in the cloud at AutoScout24
Microservices in the cloud at AutoScout24
 
Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?Cloud Native Computing: What does it mean, and is your app Cloud Native?
Cloud Native Computing: What does it mean, and is your app Cloud Native?
 
Red Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformationRed Hat Summit - Discover the foundations of digital transformation
Red Hat Summit - Discover the foundations of digital transformation
 
Fast Delivery DevOps Israel
Fast Delivery DevOps IsraelFast Delivery DevOps Israel
Fast Delivery DevOps Israel
 
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
Cloud Native in the Enterprise: Real-World Data on Container and Microservice...
 

Ähnlich wie Availability in a cloud native world - Guidelines for mere mortals v2.0

Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...Eric D. Schabell
 
Module 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et OpérationsModule 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et OpérationsFrédéric Rivain
 
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Livingstone Advisory
 
Your Journey to the Cloud
Your Journey to the CloudYour Journey to the Cloud
Your Journey to the CloudDori Degenhardt
 
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15drumulonimbus
 
Enterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New NormalEnterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New NormalQAware GmbH
 
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...Cloud Native Day Tel Aviv
 
Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010Vince Santo
 
Ever–ready for every opportunity
Ever–ready for every opportunityEver–ready for every opportunity
Ever–ready for every opportunityaccenture
 
CN_Simplifiedv1.pptx
CN_Simplifiedv1.pptxCN_Simplifiedv1.pptx
CN_Simplifiedv1.pptxKai Viljanen
 
CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?karlmotz
 
Kubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 monthsKubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 monthsMichael Tougeron
 
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloudRed Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloudEric D. Schabell
 
Pathways to Multicloud Transformation
Pathways to Multicloud TransformationPathways to Multicloud Transformation
Pathways to Multicloud TransformationIBM
 
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.NuoDB
 

Ähnlich wie Availability in a cloud native world - Guidelines for mere mortals v2.0 (20)

Cloudcomputing
CloudcomputingCloudcomputing
Cloudcomputing
 
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
Red Hat Forum Poland 2019 - 3 Pitfalls Everyone Should Avoid with Hybrid Mult...
 
Module 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et OpérationsModule 3 IUT Bobigny : Infrastructure et Opérations
Module 3 IUT Bobigny : Infrastructure et Opérations
 
Hybrid cloud computing explained
Hybrid cloud computing explainedHybrid cloud computing explained
Hybrid cloud computing explained
 
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?Will the Cloud be your disaster, or will Cloud be your disaster recovery?
Will the Cloud be your disaster, or will Cloud be your disaster recovery?
 
CloudCamp
CloudCampCloudCamp
CloudCamp
 
Your Journey to the Cloud
Your Journey to the CloudYour Journey to the Cloud
Your Journey to the Cloud
 
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15Open stack summit spring 2014   hybrid cloud landmines - 2014-05-15
Open stack summit spring 2014 hybrid cloud landmines - 2014-05-15
 
Enterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New NormalEnterprise Cloud Native is the New Normal
Enterprise Cloud Native is the New Normal
 
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
Stretching the Open Source Network - Livnat Peer, Red Hat - Cloud Native Day ...
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010
 
IBM Cloud
IBM Cloud IBM Cloud
IBM Cloud
 
Ever–ready for every opportunity
Ever–ready for every opportunityEver–ready for every opportunity
Ever–ready for every opportunity
 
CN_Simplifiedv1.pptx
CN_Simplifiedv1.pptxCN_Simplifiedv1.pptx
CN_Simplifiedv1.pptx
 
CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?CLOUD, FOG, OR SMOG?
CLOUD, FOG, OR SMOG?
 
Kubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 monthsKubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes - 7 lessons learned from 7 data centers in 7 months
 
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloudRed Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
Red Hat Summit 2018 - 3 pitfalls everyone should avoid with hybrid multicloud
 
Pathways to Multicloud Transformation
Pathways to Multicloud TransformationPathways to Multicloud Transformation
Pathways to Multicloud Transformation
 
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
WeLab Reaps Advantages of Multi-Cloud Capabilities. You Can Too.
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Availability in a cloud native world - Guidelines for mere mortals v2.0

  • 1. Availability in a Cloud-native World. Guidelines for mere mortals. Academy of Technology - PREVAIL 2019 – München 🇩🇪 — Haytham Elkhoja Chief Architect & Global Tech Leader IBM Services - Continuous Availability (a.k.a Always On) haytham.elkhoja@ibm.com Relevant links and assets: https://ibm.biz/alwaysonbook
  • 2. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 /WHOIS 2 @hek /in/haytham.Elkhoja
  • 3. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 March 2017 “Amazon broke the internet with a typo” cnn.com Impacted apps: - Netflix - HootSuite - Expedia - Slack - Business Insider - Reddit 3
  • 4. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 June 2019 “Google details 'catastrophic' cloud outage events: Promises to do better next time” zdnet.com Impacted apps: - Snapchat - Spotify - Google Docs - Youtube - Pokemon Go - Gmail 4
  • 5. What the hell is happening… 5Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 6. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 6 On why outages happen. App and DB 67% Batch 11% Hardware 14% Environmental 8% Planned Outages Process 40% Application 40% Hardware 10% OS 10% Unplanned Outages
  • 7. IBM’s Always On Patterns 7Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 8. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 8 Keeping your app available during planned and unplanned outages or failures requires geographically- distributed, multi-active, multi- regions deployments. Users Data Replication Session Replication Traffic Traffic Traffic
  • 9. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 9 The IBM Always On Pattern starts at the infrastructure layer, progresses to the data, influences application design and extends to the people and the culture. Herbie Pearthree, Distinguished Engineer hpear3@us.ibm.com
  • 10. Everything breaks! 10Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 11. State & Consistency Chaos & Validation Zones, Regions & Swimlanes Portability & Deployment Thinking differently about Availability in a Cloud-native world. 11Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 12. Portability & Deployment 12Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 13. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Code differently. Cloud-native Apps should be self- contained, polyglot, loosely- coupled, cattle-scaled, immutable, idempotent, ephemeral and protocol aware. 13
  • 14. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 No two clouds are created equal. Architect for cloud mobility. Your app should be cloud, infrastructure and OS agnostic. The 12 factor patterns will help you get there. 14
  • 15. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 No strings attached. Environment variables should be bootstrapped, also a requirement for environment parity and your own sanity. 15 FROM alpine:3.1 COPY app /app COPY docker-entrypoint.sh /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"] docker build -t app:v2 . docker run --rm -e "APP_DATADIR=/var/lib/data" -e "APP_HOST=host.com" -e "APP_PORT=3306" -e "APP_USERNAME=user" -e "APP_PASSWORD=password" -e "APP_DATABASE=test" app:v2 2019/10/15 04:44:29 Starting application...
  • 16. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Delegate responsibilities. Whatever as a Service. Somebody, somewhere has done a much better job. 16
  • 17. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Trim down the fat. Dependency management with multi- stage builds is an art one must pursue to keep apps clean and lean. 17
  • 18. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Got Syslog? Feed information and timestamp using STDOUT and STDERR. Clarify who’s the source. 18
  • 19. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 git’s your bible. Everything should be versioned, ephemeral and reproducible using GitOps methods. This includes configuration files and Infrastructure as Code. 19
  • 20. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Design for failure. Handle SIGTERM and SIGKILL like a champ. 20
  • 21. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 #$@&%*! Fail gracefully and inform your customers what’s up (or down), pun intended. 21
  • 22. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Robots > humans. Actions performed by humans hundreds of times won’t be performed the same way each time, even with the best intentions. Automate. 22
  • 23. Zones, Regions & Swimlanes 23Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 24. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Resilient clouds don’t mean resilient apps. Multi active regions help you scale while being resilient. Out of Region is more than just an insurance policy. 24
  • 25. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Stay in your swimlane. Respect region affinity and stickiness using geo load balancers to resolve traffic to the nearest region and stay there. Crossing regions is a no no. 25
  • 26. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 DNS is your best friend. Religiously steer clear from IP addresses. Service discovery will point you to the right path. And if you can’t, use Anycast. 26
  • 27. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 The most boring OS configs are also the most important ones. A /etc/resolv.conf ‘search’ entry forces traffic to your swimlane’s subdomain, helping you with region affinity. 27
  • 28. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Share-nothing. Cluster- nothing. Stretch-nothing. Control-planes are delicate creatures, especially if stretched or shared. 28 DB DB Disk DB DB DB Disk DB Disk DB DB DB DiskDisk Disk Share Everything Share Disks and Networking Share Nothing NetworkingNetworking Networking
  • 29. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Bypass failures all together. Disaster recovery processes lead to a mediocre and sometimes catastrophic experience. 29
  • 30. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Are we there yet? Discover the awesome world of service readiness, liveness probes, circuit-breakers, retries, rate-limiting, bulkheading and fallbacks. 30
  • 31. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 One deployment at a time. Rolling updates strategies for zero downtime deployments within a cluster or availability zone. 31 Deploy by adding an instance, then remove an old one Deploy by removing an instance, then add a new one Deploy by updating instances as fast as possible
  • 32. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 One region at a time. Then do the same across regions. Your customers will not even know what’s happening behind the scenes.
  • 33. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Love thy neighbor. Configure resource requests and limits. Throttle API requests.
  • 34. State & Consistency 34Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 35. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 The network is reliable. Right. CAP Theorem must be well understood when choosing data stores. Knowing that partition tolerance cannot be sacrificed, pick consistency or availability. 35 P A C Pick A or C Oracle, DB2, MySQL etc…
  • 36. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Do you really need Strong Consistency? Applications can support weak, eventual, or strong consistency. 36
  • 37. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Distributed consistency is already difficult as it is. Normally, higher availability means higher revenue. Think of ATM machines. A trumps C. Educate your business on eventual consistency. Strong consistency should be the last option, unless you’re the NYSE. 37
  • 38. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Master! Master! Write anywhere and everywhere. Master-Master, Master-less and Peer to Peer database-level replication. Shard, partition or Write/Query if you can’t. 38
  • 39. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Data Replication. More than meets the eye. Data patterns differ. Not all data is created equal. 39 Messaging BPM CEP APP Active standby or active/query Hot standby or configured active/active for fast switchover Multi-master or peer-to-peer write anywhere Data distribution filter and push Data warehouse integration and federation Data through messaging filter and push distribution
  • 40. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Conflict resolution during a network partition will make you creative. Log and notify conflicts. Last-write-wins, CQRS, write partitioning are all valid but subjective (and emotional) decisions. 40
  • 41. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 NTP is dead. Long live NTP. Achieve globally distributed, consensus respected, synchronously-replicated, databases with Google TrueTime and AWS Time Sync, if you really need it. 41
  • 42. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Database is much more than just a DBA’s job. Database versioning and backward-compatible schemas are not optional, but compulsory. 42
  • 43. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Why is my shopping cart empty? Aim for stateless, but maintain sessions, if you must. 43
  • 44. Chaos & Validation 44Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 45. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Design for feedback. Measure every single detail via KPIs and SLIs. Capture metrics and logs. There’s no such thing as too much logs. 45
  • 46. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Hope is not a strategy. Reduce uncertainty with game days, then aim to regularly injecting failure in your production environment. 46
  • 47. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Continuous tinkering is healthy. Use randomness to spoon-feed yourself with discoveries. You’ll be surprised what you come across. 47
  • 48. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 You don’t choose Chaos Monkey. Chaos Monkey chooses you. When pursuing Chaos Engineering, start controlled, small, observe, squash and learn. Remember, there is nothing Chaotic about Chaos Engineering. 48 “Chaos Engineering the discipline of experimenting on a distributed system in order to build confidence in the system's capability to withstand turbulent conditions in production.” https://principlesofchaos.org
  • 49. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Chaos Engineering is a collection of “What if”s. What if I add latency? What if I DDoS a service? What if I change the hardware clock? 49 Example of tests: • tc qdisc add dev eth0 root netem delay 300ms • wrk -t12 -c400 -d30s http://host/api/request • stress-ng --random 50 -t 60 --metrics-brief --times • iptables -I OUTPUT -p udp -d DNS Server --dport 53 -j DROP • umount /mnt/blockstorage • hwclock
  • 50. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 The rollback button is a lie. That’s not only true for application deployments but also for fault injection, as both face the same fundamental problem: State. 50
  • 51. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Go beyond trivial ICMP and connection tests. Synthetic automated monitoring help you understand what your digital users experience far from typical platform monitoring. Do it from multiple locations. 51
  • 52. Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪 Love DevOps? Wait till you meet SRE. SRE is what happens when you ask a software engineer to design an operations team. 52
  • 53. Thank you! 53Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪
  • 54. 54Availability in a Cloud-native World. Guidelines for mere mortals. PREVAIL 2019 – München 🇩🇪