SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Real-World Resiliency
in the Face of a Datacenter Disaster
HOSTED BY
Stanislav Komanec
VP of Engineering, Kiwi.com
IN PARTNERSHIP WITH
Presenter
Stanislav Komanec
VP of Engineering Platform
My past:
➔ Backend developer
➔ Technical team lead
➔ Head of Platform
➔ VP of Engineering
4. Kiwi.com’s
Preparedness
1. Kiwi.com
Intro/Overview
7. Final remarks
6. Choosing/
Implementing Resilient
Systems
3. Kiwi.com Impact
5. Best Practices for
Resiliency
2. OVHcloud Fire
8. Q&A
Agenda
About Kiwi.com
➔ Virtual global supercarrier
➔ Seamless travel experience
➔ Connecting “A” to “B”
➔ Virtual interlining
Kiwi.com History
2012
Skypicker founded
2014
Acquisition of
whichairline.com
2016
Rebranded to Kiwi.com
2019
General Atlantics on
board
500+
People in R&D
31
Average age
66+
Nationalities
140+
Dogs
Kiwi.com: The Team
👥
🐶
🚩
💼
100K+
Seats weekly
50K+
Bookings weekly
5M+
Searches weekly
900+
Partners
Kiwi.com: Business Numbers
✈
🤝
🔎
💺
➔ Our technology unlocks our key features
➔ Best inventory in the world
➔ Great search
➔ Features like multi-city, nomad, good deals...
Kiwi.com Key Features
➔ Cloud native
◆ Infrastructure as a code
➔ Micro-services oriented architecture
➔ 600+ microservices, aligned in specific domains
Kiwi.com under the Hood – Architecture
➔ Main database – Scylla
➔ 400K+ /s reads, 200K+ /s writes; we are rewriting whole DB once in
10 days
➔ Infrastructure
◆ OVH main bare-metal provider
◆ Megaport
◆ GCP as the main cloud provider – web services
Kiwi.com under the Hood – Infrastructure
Geographically Distributed Datacenters
>500km
>250km
>550km
Main database locations
OVHcloud Fire
Events of 10 Mar 2021
➔ Strasbourg, France
➔ Wednesday, 10 March 2021
➔ Fire breaks out 00:47 CET
OVHcloud Fire
OVHcloud’s Strasbourg SBG2 Datacenter engulfed
in flames.
(Image: SDIS du Bas Rhin)
OVHcloud’s Strasbourg SBG2 Datacenter
the next morning. (Image via Twitter)
➔ Strasbourg datacenter impact
■ SBG2 totally consumed
■ SBG1 4 of 12 rooms gutted
■ SBG3 & SBG4 proactively taken offline
➔ Internet impact (as per Netcraft)
■ 3.6 million websites
■ 464,000 domains
■ 1 in 50 sites in all of .fr TLD
Damage Assessment
“Websites that went offline during the fire included online
banks, webmail services, news sites, online shops selling PPE
to protect against coronavirus, and several countries’
government websites.”
— Netcraft
Kiwi.com Impact
Response to the Fire
OVHCloud Fire
>500km
65km
>550km
Monitoring the Problem
Latencies briefly rise
until unavailable servers
are taken out of cluster
10 of 30 servers
are suddenly
unavailable
Requests per
second per server;
note how some
drop towards zero
then blip out of
existence
Timeline of Fire
00:47 CET Fire breaks out in OVHcloud Strasbourg SBG2
01:12 CET Kiwi.com nodes in Strasbourg start falling off the cluster
01:15 CET All 10 Strasbourg nodes offline; traffic diverted to 2x other Kiwi.com datacenters (20
servers remaining)
02:23 CET Production operational, we manually need to tweak some services around the main
database.
08:54 CET Tweaks deployed, we are fully operational
➔ Degraded performance on some services
◆ Trying to rebalance load
➔ Moving some affected service to different place
We were up & running
Kiwi.com Impact
Kiwi.com: Our primary Database...
Kiwi.com Impact (in theory)
What if...
What if... Kiwi.com Customer’s Impact
➔ Customer perspective
■ They could not use the service
■ They could not changes bookings
■ We could not process changes in itineraries
● Customers might be at the airports waiting for flights
What if. Kiwi.com Technical Impact
➔ Micro-services – domino effect
➔ Other teams
■ Issues will be cascading: in order to mitigate it, we would need to stop
the services in specific order
➔ Inconsistencies
■ We might end up with lot of inconsistencies even for current customers
➔ Customer support overloaded
What if…. How to Handle the Situation
➔ Stop services, in right order
➔ Spin off new cluster
➔ Let it sync
➔ Run data refreshers
➔ Slowly start web services for customer
What if…. Estimation
➔ Revenue loss
➔ Reputation loss
■ Customers would buy elsewhere
➔ Inconsistencies
■ A lot of manual work
Kiwi.com Preparedness
Incident response
➔ Choice of technology
■ High availability architecture
■ Data replication for resiliency
➔ Choice of cloud vendor
■ Geographic distribution of datacenters
■ Capability to manage SLAs
➔ Having procedures in place
➔ Right environment
Long Before the Fire Broke Out
➔ Requirements
■ High resiliency – to provide best value to customer
■ Low latency – to enable products like Nomad, Multicity search...
➔ History
■ PostgreSQL databases - consistency issues
■ Cassandra - performance issues
What experience did we have?
➔ Peer-to-peer leaderless architecture
■ No single point-of-failure
➔ User-controllable replication factor (RF)
■ RF=1; We have 3 data centres
➔ Per-operation tunable consistency levels
■ One, Quorum, All, etc.
➔ Automatic multi-datacenter replication
■ Keeps different sites in sync
➔ Rack-aware and datacenter-aware
■ Ensures replicas are physically and
geographically distributed
Scylla’s High Availability Architecture
We Needed a Plan
Beginnings of a Plan
Goals:
3 datacenters
3 cities
geographically
separated
➔ You need to unlock technology advantages via the great team
➔ The best way is to setup culture & procedures
■ Creates the right environment
Team & Process Plan
➔ Proper monitoring in place
➔ Proper alerting
➔ Incident management system
➔ Postmortems
Incident Management
➔ Learning from each incident
➔ Making our systems more robust
➔ Building the culture
■ Present your mistakes
■ Wheels of misfortune
Blameless Culture
Best Practices for Resiliency
➔ Critical path
■ How to find it?
■ How to measure it?
➔ Implementation phase
➔ Proactive vs Reactive approach
Where to Start – Identification
➔ Engineers perspective
■ Love automations
■ Don’t like a lot of manual steps
Where to Start – Proactive vs Reactive
➔ Business perspective
■ Cost efficiency
■ Risk factors
Where to Start – Proactive vs Reactive
➔ Invest where it matters
■ Time to time it’s about overscaling the whole datacenter, not just an
instance or two
■ Critical path
Where to Start – Overscale?
Choosing/Implementing
Resilient Systems
➔ Overscale
■ Along the defined critical path
➔ Fallbacks solutions (e.g. in networking)
➔ Measure
➔ Run the tests
➔ Example: Chaos monkey approach
Proactive Solutions
➔ Get the great plan
➔ Well tested, keep it up to date
➔ For example: runbooks
Reactive Solution
The Culture
Get the Great Team
➔ It’s important to build the right
environment
➔ Thank you to all members of the
team
Get the Greatest Team
➔ Find the partners, who consider your problems their own
■ OVH
■ GCP
■ Scylla
● Initial setup, great support over the years
■ Megaport
■ Cloudflare…
Get the Great Partners
Final remarks
A Good Year (2006)
Uncle Henry:
“It's inevitable to lose now and
again. The trick is not to make a
habit of it.”
Takeaways
➔ Outages are inevitable. It's just up to us to be prepared
➔ Plan for the worst, hope for the best
➔ Get right balance between proactivity and reactivity
➔ Get the great team & cultivate blameless culture
■ Drives the innovation most effectively
Lessons Learned
Q&A
HOSTED BY
IN PARTNERSHIP WITH

Weitere ähnliche Inhalte

Was ist angesagt?

AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...ScyllaDB
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and SeastarTzach Livyatan
 
Workshop - How to benchmark your database
Workshop - How to benchmark your databaseWorkshop - How to benchmark your database
Workshop - How to benchmark your databaseScyllaDB
 
Seastar Summit 2019 vectorized.io
Seastar Summit 2019   vectorized.ioSeastar Summit 2019   vectorized.io
Seastar Summit 2019 vectorized.ioScyllaDB
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityScyllaDB
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0ScyllaDB
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseScyllaDB
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScyllaDB
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
 
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! JapanScylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! JapanScyllaDB
 
Addressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraAddressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraScyllaDB
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsScyllaDB
 
Scylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScyllaDB
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB ScyllaDB
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScyllaDB
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScyllaDB
 
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScyllaDB
 

Was ist angesagt? (20)

AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
Workshop - How to benchmark your database
Workshop - How to benchmark your databaseWorkshop - How to benchmark your database
Workshop - How to benchmark your database
 
Seastar Summit 2019 vectorized.io
Seastar Summit 2019   vectorized.ioSeastar Summit 2019   vectorized.io
Seastar Summit 2019 vectorized.io
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availability
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph DatabaseFireEye & Scylla: Intel Threat Analysis Using a Graph Database
FireEye & Scylla: Intel Threat Analysis Using a Graph Database
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi Kivity
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! JapanScylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
 
Addressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache CassandraAddressing the High Cost of Apache Cassandra
Addressing the High Cost of Apache Cassandra
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
Scylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native Database
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
Expedia Group: Our Migration Journey from Apache Cassandra to ScyllaDB
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times FasterScylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
 
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
 

Ähnlich wie Real-World Resiliency: Surviving Datacenter Disaster

Running OpenStack in Production
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in ProductionTesora
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsNicolas (Nick) Barcet
 
Cloud Native Microservices - Building Blocks for Digital Innovation
Cloud Native Microservices - Building Blocks for Digital InnovationCloud Native Microservices - Building Blocks for Digital Innovation
Cloud Native Microservices - Building Blocks for Digital InnovationDiego Pacheco
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platformconfluent
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringCliff Crocker
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringAkamai Technologies
 
Cloud computing
Cloud computing Cloud computing
Cloud computing Varun Raj
 
Cloud Computing Presentation by Skcript
Cloud Computing Presentation by SkcriptCloud Computing Presentation by Skcript
Cloud Computing Presentation by SkcriptSkcript
 
Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)Ricardo Amaro
 
CommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud EventCommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud EventGoogle
 
Defining a Cloud Adoption Journey to Deliver Cloud Native Services
Defining a Cloud Adoption Journey to Deliver Cloud Native ServicesDefining a Cloud Adoption Journey to Deliver Cloud Native Services
Defining a Cloud Adoption Journey to Deliver Cloud Native ServicesAmazon Web Services
 
Sustainable Architecture Design
Sustainable Architecture DesignSustainable Architecture Design
Sustainable Architecture DesignKevin Francis
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...Srijan Technologies
 
Queue Everything and Please Everyone
Queue Everything and Please EveryoneQueue Everything and Please Everyone
Queue Everything and Please EveryoneVaidik Kapoor
 
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...Amazon Web Services
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in PracticeC4Media
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIRightScale
 
The Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowThe Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowInside Analysis
 

Ähnlich wie Real-World Resiliency: Surviving Datacenter Disaster (20)

Running OpenStack in Production
Running OpenStack in ProductionRunning OpenStack in Production
Running OpenStack in Production
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 
Cloud Native Microservices - Building Blocks for Digital Innovation
Cloud Native Microservices - Building Blocks for Digital InnovationCloud Native Microservices - Building Blocks for Digital Innovation
Cloud Native Microservices - Building Blocks for Digital Innovation
 
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data PlatformStream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
Stream Me Up, Scotty: Transitioning to the Cloud Using a Streaming Data Platform
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance Monitoring
 
Edge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance MonitoringEdge 2014: A Modern Approach to Performance Monitoring
Edge 2014: A Modern Approach to Performance Monitoring
 
Cloud computing
Cloud computing Cloud computing
Cloud computing
 
Cloud Computing Presentation by Skcript
Cloud Computing Presentation by SkcriptCloud Computing Presentation by Skcript
Cloud Computing Presentation by Skcript
 
Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)
 
PEnDAR webinar 2 with notes
PEnDAR webinar 2 with notesPEnDAR webinar 2 with notes
PEnDAR webinar 2 with notes
 
CommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud EventCommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud Event
 
Defining a Cloud Adoption Journey to Deliver Cloud Native Services
Defining a Cloud Adoption Journey to Deliver Cloud Native ServicesDefining a Cloud Adoption Journey to Deliver Cloud Native Services
Defining a Cloud Adoption Journey to Deliver Cloud Native Services
 
Sustainable Architecture Design
Sustainable Architecture DesignSustainable Architecture Design
Sustainable Architecture Design
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
Queue Everything and Please Everyone
Queue Everything and Please EveryoneQueue Everything and Please Everyone
Queue Everything and Please Everyone
 
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
Disaster Recovery Best Practices and Customer Use Cases: CGS and Health Quest...
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
 
A Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROIA Framework to Measure and Maximize Cloud ROI
A Framework to Measure and Maximize Cloud ROI
 
The Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowThe Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and How
 

Mehr von ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

Mehr von ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Kürzlich hochgeladen

Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 

Kürzlich hochgeladen (20)

Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 

Real-World Resiliency: Surviving Datacenter Disaster

  • 1. Real-World Resiliency in the Face of a Datacenter Disaster HOSTED BY Stanislav Komanec VP of Engineering, Kiwi.com IN PARTNERSHIP WITH
  • 2. Presenter Stanislav Komanec VP of Engineering Platform My past: ➔ Backend developer ➔ Technical team lead ➔ Head of Platform ➔ VP of Engineering
  • 3. 4. Kiwi.com’s Preparedness 1. Kiwi.com Intro/Overview 7. Final remarks 6. Choosing/ Implementing Resilient Systems 3. Kiwi.com Impact 5. Best Practices for Resiliency 2. OVHcloud Fire 8. Q&A Agenda
  • 4. About Kiwi.com ➔ Virtual global supercarrier ➔ Seamless travel experience ➔ Connecting “A” to “B” ➔ Virtual interlining
  • 5. Kiwi.com History 2012 Skypicker founded 2014 Acquisition of whichairline.com 2016 Rebranded to Kiwi.com 2019 General Atlantics on board
  • 6. 500+ People in R&D 31 Average age 66+ Nationalities 140+ Dogs Kiwi.com: The Team 👥 🐶 🚩 💼
  • 7. 100K+ Seats weekly 50K+ Bookings weekly 5M+ Searches weekly 900+ Partners Kiwi.com: Business Numbers ✈ 🤝 🔎 💺
  • 8. ➔ Our technology unlocks our key features ➔ Best inventory in the world ➔ Great search ➔ Features like multi-city, nomad, good deals... Kiwi.com Key Features
  • 9. ➔ Cloud native ◆ Infrastructure as a code ➔ Micro-services oriented architecture ➔ 600+ microservices, aligned in specific domains Kiwi.com under the Hood – Architecture
  • 10. ➔ Main database – Scylla ➔ 400K+ /s reads, 200K+ /s writes; we are rewriting whole DB once in 10 days ➔ Infrastructure ◆ OVH main bare-metal provider ◆ Megaport ◆ GCP as the main cloud provider – web services Kiwi.com under the Hood – Infrastructure
  • 13. OVHcloud Fire Events of 10 Mar 2021
  • 14. ➔ Strasbourg, France ➔ Wednesday, 10 March 2021 ➔ Fire breaks out 00:47 CET OVHcloud Fire OVHcloud’s Strasbourg SBG2 Datacenter engulfed in flames. (Image: SDIS du Bas Rhin)
  • 15. OVHcloud’s Strasbourg SBG2 Datacenter the next morning. (Image via Twitter) ➔ Strasbourg datacenter impact ■ SBG2 totally consumed ■ SBG1 4 of 12 rooms gutted ■ SBG3 & SBG4 proactively taken offline ➔ Internet impact (as per Netcraft) ■ 3.6 million websites ■ 464,000 domains ■ 1 in 50 sites in all of .fr TLD Damage Assessment
  • 16. “Websites that went offline during the fire included online banks, webmail services, news sites, online shops selling PPE to protect against coronavirus, and several countries’ government websites.” — Netcraft
  • 19.
  • 20. Monitoring the Problem Latencies briefly rise until unavailable servers are taken out of cluster 10 of 30 servers are suddenly unavailable Requests per second per server; note how some drop towards zero then blip out of existence
  • 21. Timeline of Fire 00:47 CET Fire breaks out in OVHcloud Strasbourg SBG2 01:12 CET Kiwi.com nodes in Strasbourg start falling off the cluster 01:15 CET All 10 Strasbourg nodes offline; traffic diverted to 2x other Kiwi.com datacenters (20 servers remaining) 02:23 CET Production operational, we manually need to tweak some services around the main database. 08:54 CET Tweaks deployed, we are fully operational
  • 22. ➔ Degraded performance on some services ◆ Trying to rebalance load ➔ Moving some affected service to different place We were up & running Kiwi.com Impact
  • 23. Kiwi.com: Our primary Database...
  • 24. Kiwi.com Impact (in theory) What if...
  • 25. What if... Kiwi.com Customer’s Impact ➔ Customer perspective ■ They could not use the service ■ They could not changes bookings ■ We could not process changes in itineraries ● Customers might be at the airports waiting for flights
  • 26. What if. Kiwi.com Technical Impact ➔ Micro-services – domino effect ➔ Other teams ■ Issues will be cascading: in order to mitigate it, we would need to stop the services in specific order ➔ Inconsistencies ■ We might end up with lot of inconsistencies even for current customers ➔ Customer support overloaded
  • 27. What if…. How to Handle the Situation ➔ Stop services, in right order ➔ Spin off new cluster ➔ Let it sync ➔ Run data refreshers ➔ Slowly start web services for customer
  • 28. What if…. Estimation ➔ Revenue loss ➔ Reputation loss ■ Customers would buy elsewhere ➔ Inconsistencies ■ A lot of manual work
  • 30. ➔ Choice of technology ■ High availability architecture ■ Data replication for resiliency ➔ Choice of cloud vendor ■ Geographic distribution of datacenters ■ Capability to manage SLAs ➔ Having procedures in place ➔ Right environment Long Before the Fire Broke Out
  • 31. ➔ Requirements ■ High resiliency – to provide best value to customer ■ Low latency – to enable products like Nomad, Multicity search... ➔ History ■ PostgreSQL databases - consistency issues ■ Cassandra - performance issues What experience did we have?
  • 32. ➔ Peer-to-peer leaderless architecture ■ No single point-of-failure ➔ User-controllable replication factor (RF) ■ RF=1; We have 3 data centres ➔ Per-operation tunable consistency levels ■ One, Quorum, All, etc. ➔ Automatic multi-datacenter replication ■ Keeps different sites in sync ➔ Rack-aware and datacenter-aware ■ Ensures replicas are physically and geographically distributed Scylla’s High Availability Architecture
  • 33. We Needed a Plan
  • 34. Beginnings of a Plan Goals: 3 datacenters 3 cities geographically separated
  • 35. ➔ You need to unlock technology advantages via the great team ➔ The best way is to setup culture & procedures ■ Creates the right environment Team & Process Plan
  • 36. ➔ Proper monitoring in place ➔ Proper alerting ➔ Incident management system ➔ Postmortems Incident Management
  • 37. ➔ Learning from each incident ➔ Making our systems more robust ➔ Building the culture ■ Present your mistakes ■ Wheels of misfortune Blameless Culture
  • 38. Best Practices for Resiliency
  • 39. ➔ Critical path ■ How to find it? ■ How to measure it? ➔ Implementation phase ➔ Proactive vs Reactive approach Where to Start – Identification
  • 40. ➔ Engineers perspective ■ Love automations ■ Don’t like a lot of manual steps Where to Start – Proactive vs Reactive
  • 41. ➔ Business perspective ■ Cost efficiency ■ Risk factors Where to Start – Proactive vs Reactive
  • 42. ➔ Invest where it matters ■ Time to time it’s about overscaling the whole datacenter, not just an instance or two ■ Critical path Where to Start – Overscale?
  • 44. ➔ Overscale ■ Along the defined critical path ➔ Fallbacks solutions (e.g. in networking) ➔ Measure ➔ Run the tests ➔ Example: Chaos monkey approach Proactive Solutions
  • 45. ➔ Get the great plan ➔ Well tested, keep it up to date ➔ For example: runbooks Reactive Solution
  • 48. ➔ It’s important to build the right environment ➔ Thank you to all members of the team Get the Greatest Team
  • 49. ➔ Find the partners, who consider your problems their own ■ OVH ■ GCP ■ Scylla ● Initial setup, great support over the years ■ Megaport ■ Cloudflare… Get the Great Partners
  • 51. A Good Year (2006) Uncle Henry: “It's inevitable to lose now and again. The trick is not to make a habit of it.” Takeaways
  • 52. ➔ Outages are inevitable. It's just up to us to be prepared ➔ Plan for the worst, hope for the best ➔ Get right balance between proactivity and reactivity ➔ Get the great team & cultivate blameless culture ■ Drives the innovation most effectively Lessons Learned