SlideShare a Scribd company logo
1 of 31
Download to read offline
Spotify’s outage
of 8 March 2022,
explained
Kat Liu
Senior Software Engineer
Spotify
About Me
● 8+ years experience as a backend
software engineer
● Originally from NJ, lived in CA 1 year,
in Berlin for 6+ years
● Work stuff I like: Distributed systems,
and incidents!
● Been at Spotify since October 2021
on User Platform tribe
The outage
2022-03-08 Tue 19:00 CET
● International Women’s Day!
● Day off in Berlin
2022-03-08 19:00
● International Women’s Day!
● Day off in Berlin
The outage
2022-03-08 19:00
● International Women’s Day!
● Day off in Berlin
The outage
2022-03-08 19:00
● International Women’s Day!
● Day off in Berlin
The outage
2022-03-08 19:00
● International Women’s Day!
● Day off in Berlin
The outage
The outage
The outage
The outage
Around the web
Discord
downdetector.com
Tech Stack @ Spotify
Tech Stack @ Spotify
Service Discovery @ Spotify
● Nameless, developed in-house
● Built on top of DNS protocol, serves
SRV records
● DNS propagation is naturally slow
● Client-heavy logic that does load
balancing
Traffic Director @ Spotify
● Traffic control plane for service mesh
● Fully-managed by Google
● Smarter load balancing
● Built-in service discovery
● Uses open-source xDS APIs by Envoy
for gPRC
The outage
Mar 08, 2022 6:30:44 PM
io.grpc.internal.ManagedChannelImpl$NameResolverListener
handleErrorInSyncContext
WARNING: [Channel<1>: (xds:///service2)] Failed to resolve name.
status=Status{code=NOT_FOUND, description=Requested entity was not
found., cause=null}
The outage
Mar 08, 2022 6:30:44 PM
io.grpc.internal.ManagedChannelImpl$NameResolverListener
handleErrorInSyncContext
WARNING: [Channel<1>: (xds:///service2)] Failed to resolve name.
status=Status{code=NOT_FOUND, description=Requested entity was not
found., cause=null}
Service2 not reachable because Traffic Director failed to resolve
The fix
● Revert all services back to using Nameless
● Service mostly restored by 19:40 CET
But why were users
logged out?
The aftermath
The aftermath
The aftermath
The aftermath https://github.com/grpc/grpc-java/issues/8950
The aftermath
● ~50 million login sessions disrupted
● 3 million new duplicate accounts created in the next days / weeks
Lessons Learned
● Sometimes you are at the mercy of 3rd party SLAs
○ Login service displayed correct behavior on NOT_FOUND
○ Keep a fallback to Nameless? Lots of issues with that
○ Fewer synchronous calls on critical paths
● SSO login vs. email login usually confuses users
● Spotify is fully of smart, proactive, supportive engineers who even take
the time to have fun during an incident
Lessons Learned
Lessons Learned
Lessons Learned
Lessons Learned
Lessons Learned
Acknowledgements
● All 100+ colleagues online throughout the incident
● My own team for coming online without hesitation
● Infrastructure team for quickly spotting the bug and contacting Google
Thanks for listening!
Questions?

More Related Content

Similar to stackconf 2022: Spotify’s outage of 8.3.2022, explained

SFSCON23 - Luca Guadagnini - Ithaca the Clean and Hexagonal Architectural Is...
SFSCON23 - Luca Guadagnini - Ithaca  the Clean and Hexagonal Architectural Is...SFSCON23 - Luca Guadagnini - Ithaca  the Clean and Hexagonal Architectural Is...
SFSCON23 - Luca Guadagnini - Ithaca the Clean and Hexagonal Architectural Is...South Tyrol Free Software Conference
 
Internet Computer BUIDL Bitcoin Hackathon Launch
Internet Computer BUIDL Bitcoin Hackathon LaunchInternet Computer BUIDL Bitcoin Hackathon Launch
Internet Computer BUIDL Bitcoin Hackathon LaunchNeven6
 
Internet Computer BUIDL Bitcoin Hackathon powered by Encode Launch Deck
Internet Computer BUIDL Bitcoin Hackathon powered by Encode Launch DeckInternet Computer BUIDL Bitcoin Hackathon powered by Encode Launch Deck
Internet Computer BUIDL Bitcoin Hackathon powered by Encode Launch DeckMarta Encode
 
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEASTTHE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEASTOpher Dubrovsky
 
The 30-Month Migration
The 30-Month MigrationThe 30-Month Migration
The 30-Month Migrationglvdb
 
Namecoin and distributed DNS
Namecoin and distributed DNSNamecoin and distributed DNS
Namecoin and distributed DNSLuca Bertagnolio
 
Agile Incident Response and Resolution in the Wold of Devops
Agile Incident Response and Resolution in the Wold of DevopsAgile Incident Response and Resolution in the Wold of Devops
Agile Incident Response and Resolution in the Wold of DevopsAtlassian
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentMatt Graham
 
Continuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritageContinuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritageRan Levy
 
AWS Community Day 2022 Angelo Mandato_First Lambda function using VSCode - C...
AWS Community Day 2022  Angelo Mandato_First Lambda function using VSCode - C...AWS Community Day 2022  Angelo Mandato_First Lambda function using VSCode - C...
AWS Community Day 2022 Angelo Mandato_First Lambda function using VSCode - C...AWS Chicago
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Chasing the RESTful Trinity - Client CLI and Documentation
Chasing the RESTful Trinity - Client CLI and DocumentationChasing the RESTful Trinity - Client CLI and Documentation
Chasing the RESTful Trinity - Client CLI and DocumentationRoberto Cortez
 
Resful Trinity Code One - San Francisco
Resful Trinity Code One - San FranciscoResful Trinity Code One - San Francisco
Resful Trinity Code One - San FranciscoIvan Junckes Filho
 
How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceLaurent Bernaille
 
Network programming with Qt (C++)
Network programming with Qt (C++)Network programming with Qt (C++)
Network programming with Qt (C++)Manohar Kuse
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemAnnika Wickert
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...David Geurts
 
"React applications Failures", Nikita Galkin
"React applications Failures", Nikita Galkin"React applications Failures", Nikita Galkin
"React applications Failures", Nikita GalkinFwdays
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
Twitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureTwitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureC4Media
 

Similar to stackconf 2022: Spotify’s outage of 8.3.2022, explained (20)

SFSCON23 - Luca Guadagnini - Ithaca the Clean and Hexagonal Architectural Is...
SFSCON23 - Luca Guadagnini - Ithaca  the Clean and Hexagonal Architectural Is...SFSCON23 - Luca Guadagnini - Ithaca  the Clean and Hexagonal Architectural Is...
SFSCON23 - Luca Guadagnini - Ithaca the Clean and Hexagonal Architectural Is...
 
Internet Computer BUIDL Bitcoin Hackathon Launch
Internet Computer BUIDL Bitcoin Hackathon LaunchInternet Computer BUIDL Bitcoin Hackathon Launch
Internet Computer BUIDL Bitcoin Hackathon Launch
 
Internet Computer BUIDL Bitcoin Hackathon powered by Encode Launch Deck
Internet Computer BUIDL Bitcoin Hackathon powered by Encode Launch DeckInternet Computer BUIDL Bitcoin Hackathon powered by Encode Launch Deck
Internet Computer BUIDL Bitcoin Hackathon powered by Encode Launch Deck
 
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEASTTHE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
 
The 30-Month Migration
The 30-Month MigrationThe 30-Month Migration
The 30-Month Migration
 
Namecoin and distributed DNS
Namecoin and distributed DNSNamecoin and distributed DNS
Namecoin and distributed DNS
 
Agile Incident Response and Resolution in the Wold of Devops
Agile Incident Response and Resolution in the Wold of DevopsAgile Incident Response and Resolution in the Wold of Devops
Agile Incident Response and Resolution in the Wold of Devops
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous Deployment
 
Continuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritageContinuous Deployment Applied at MyHeritage
Continuous Deployment Applied at MyHeritage
 
AWS Community Day 2022 Angelo Mandato_First Lambda function using VSCode - C...
AWS Community Day 2022  Angelo Mandato_First Lambda function using VSCode - C...AWS Community Day 2022  Angelo Mandato_First Lambda function using VSCode - C...
AWS Community Day 2022 Angelo Mandato_First Lambda function using VSCode - C...
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Chasing the RESTful Trinity - Client CLI and Documentation
Chasing the RESTful Trinity - Client CLI and DocumentationChasing the RESTful Trinity - Client CLI and Documentation
Chasing the RESTful Trinity - Client CLI and Documentation
 
Resful Trinity Code One - San Francisco
Resful Trinity Code One - San FranciscoResful Trinity Code One - San Francisco
Resful Trinity Code One - San Francisco
 
How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My Namespace
 
Network programming with Qt (C++)
Network programming with Qt (C++)Network programming with Qt (C++)
Network programming with Qt (C++)
 
FFMEET: running a non-profit conference system
FFMEET: running a non-profit conference systemFFMEET: running a non-profit conference system
FFMEET: running a non-profit conference system
 
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
Unite2014 Bunny Necropsy - Servers, Syncing Game State, Security and Optimiza...
 
"React applications Failures", Nikita Galkin
"React applications Failures", Nikita Galkin"React applications Failures", Nikita Galkin
"React applications Failures", Nikita Galkin
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Twitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureTwitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat Architecture
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

stackconf 2022: Spotify’s outage of 8.3.2022, explained