SlideShare a Scribd company logo
Suche senden
Hochladen
Einloggen
Registrieren
The Top Outages of 2022: Analysis and Takeaways
Melden
ThousandEyes
Folgen
ThousandEyes
19. Jan 2023
•
0 gefällt mir
•
140 views
1
von
24
The Top Outages of 2022: Analysis and Takeaways
19. Jan 2023
•
0 gefällt mir
•
140 views
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Melden
Technologie
Presented by Brian Tobia and Chris Villemez
ThousandEyes
Folgen
ThousandEyes
Recomendados
How to Evaluate, Rollout, and Operationalize Your SD-WAN Projects
ThousandEyes
131 views
•
17 Folien
Getting Demo & POV Ready
ThousandEyes
393 views
•
33 Folien
Microsoft Outage Analysis
ThousandEyes
409 views
•
14 Folien
Introduction to ThousandEyes
ThousandEyes
158 views
•
39 Folien
Realise True Business Value .pdf
ThousandEyes
81 views
•
19 Folien
Introduction To ThousandEyes
ThousandEyes
136 views
•
37 Folien
Más contenido relacionado
Was ist angesagt?
Network architecture design for microservices on GCP
Raphaël FRAYSSE
530 views
•
61 Folien
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
ThousandEyes
227 views
•
18 Folien
Cisco IT and ThousandEyes
ThousandEyes
5.5K views
•
18 Folien
Confidential Computing overview
Mark Argent
340 views
•
13 Folien
Introduction to ThousandEyes
ThousandEyes
527 views
•
36 Folien
Route Leak Prevension with BGP Community
Bangladesh Network Operators Group
110 views
•
38 Folien
Was ist angesagt?
(20)
Network architecture design for microservices on GCP
Raphaël FRAYSSE
•
530 views
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
ThousandEyes
•
227 views
Cisco IT and ThousandEyes
ThousandEyes
•
5.5K views
Confidential Computing overview
Mark Argent
•
340 views
Introduction to ThousandEyes
ThousandEyes
•
527 views
Route Leak Prevension with BGP Community
Bangladesh Network Operators Group
•
110 views
Deep dive on Amazon Managed Blockchain
Amazon Web Services
•
48.1K views
The 3 aspects of network performance management
ManageEngine
•
5.6K views
Cisco Meraki Portfolio Guide
Maticmind
•
7K views
Elastic-Engineering
Araf Karsh Hamid
•
470 views
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
•
122 views
emea_cisco_live_webinar_150623.pptx
ThousandEyes
•
195 views
SD-WAN, Meet MARVIS.
Juniper Networks
•
2.8K views
Management Consultancy Saudi Telecom Digital Transformation Design Thinking
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
•
576 views
Using AIOps to reduce incidents volume
Amazon Web Services
•
604 views
Google Cloud Connect Korea - Sep 2017
Google Cloud Korea
•
793 views
Understanding Cisco’s Next Generation SD-WAN Solution with Viptela
Cisco Canada
•
14.7K views
MetTel SD-WAN Hidden Benefits - webinar deck - Jan '17
Scott Logan
•
825 views
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
•
138 views
Cisco Study: State of Web Security
Cisco Canada
•
3.4K views
Similar a The Top Outages of 2022: Analysis and Takeaways
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
ThousandEyes
62 views
•
24 Folien
Introduction to ThousandEyes
ThousandEyes
98 views
•
38 Folien
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
ThousandEyes
38 views
•
29 Folien
Introduction to ThousandEyes
ThousandEyes
126 views
•
36 Folien
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
ThousandEyes
114 views
•
29 Folien
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edit...
ThousandEyes
63 views
•
29 Folien
Similar a The Top Outages of 2022: Analysis and Takeaways
(20)
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
ThousandEyes
•
62 views
Introduction to ThousandEyes
ThousandEyes
•
98 views
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
ThousandEyes
•
38 views
Introduction to ThousandEyes
ThousandEyes
•
126 views
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
ThousandEyes
•
114 views
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edit...
ThousandEyes
•
63 views
0328apjcintrotothousandeyeswebinar-230328233735-4df10d7f.pdf
Saurabh Chauhan
•
8 views
Introduction to ThousandEyes
ThousandEyes
•
120 views
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
ThousandEyes
•
118 views
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
ThousandEyes
•
24 views
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
ThousandEyes
•
66 views
Owning End-to-end Application Experience With ThousandEyes
ThousandEyes
•
144 views
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...
ThousandEyes
•
237 views
The Top Outages of 2021: Analysis and Takeaways
ThousandEyes
•
678 views
Realize True Business Value With ThousandEyes
ThousandEyes
•
48 views
Introduction to ThousandEyes
ThousandEyes
•
95 views
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
•
41 views
Discover the Power of ThousandEyes on Your Meraki MX
ThousandEyes
•
16 views
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
•
65 views
Is Your Network Ready?
Brocade
•
951 views
Más de ThousandEyes
roomos_webinar_280923_v2.pptx
ThousandEyes
28 views
•
29 Folien
Improving Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft ...
ThousandEyes
85 views
•
25 Folien
Improve Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft Te...
ThousandEyes
102 views
•
26 Folien
New ThousandEyes Product Features and Release Highlights: July 2023
ThousandEyes
40 views
•
30 Folien
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
28 views
•
15 Folien
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
5 views
•
15 Folien
Más de ThousandEyes
(17)
roomos_webinar_280923_v2.pptx
ThousandEyes
•
28 views
Improving Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft ...
ThousandEyes
•
85 views
Improve Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft Te...
ThousandEyes
•
102 views
New ThousandEyes Product Features and Release Highlights: July 2023
ThousandEyes
•
40 views
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
•
28 views
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
•
5 views
New ThousandEyes Product Features and Release Highlights: June 2023
ThousandEyes
•
60 views
A Partner Overview to ThousandEyes - v1_1_ES.pptx
ThousandEyes
•
33 views
A Partner Overview to ThousandEyes - v1_2_DE.pptx
ThousandEyes
•
32 views
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
•
58 views
06_08_emea_how_to_evaluate_rollout_and_operationalize_your_sdwan_projects_web...
ThousandEyes
•
68 views
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
•
77 views
May 2023 EMEA New ThousandEyes Product Features and Release Highlights.pptx
ThousandEyes
•
92 views
Introduction to ThousandEyes
ThousandEyes
•
33 views
New ThousandEyes Product Features and Release Highlights: March 2023
ThousandEyes
•
28 views
New ThousandEyes Product Features and Release Highlights: March 2023
ThousandEyes
•
57 views
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
•
118 views
Último
"Software Architecture for Humans!", Eberhard Wolff
Fwdays
19 views
•
70 Folien
alfred-product-research-proposal.pdf
AlfredSuratos
24 views
•
13 Folien
"The Intersection of architecture and implementation", Mark Richards
Fwdays
34 views
•
81 Folien
9C Monthly Newsletter - SEPT 2023
PublishingTeam
234 views
•
11 Folien
"Stateful app as an efficient way to build dispatching for riders and drivers...
Fwdays
37 views
•
46 Folien
Empowering City Clerks
OnBoard
82 views
•
12 Folien
Último
(20)
"Software Architecture for Humans!", Eberhard Wolff
Fwdays
•
19 views
alfred-product-research-proposal.pdf
AlfredSuratos
•
24 views
"The Intersection of architecture and implementation", Mark Richards
Fwdays
•
34 views
9C Monthly Newsletter - SEPT 2023
PublishingTeam
•
234 views
"Stateful app as an efficient way to build dispatching for riders and drivers...
Fwdays
•
37 views
Empowering City Clerks
OnBoard
•
82 views
EuroBSDCon 2023 - (auto)Installing BSD Systems - Cases using pfSense, TrueNAS...
Vinícius Zavam
•
67 views
"Intro to Stateful Services or How to get 1 million RPS from a single node", ...
Fwdays
•
17 views
Deep Dive Microsoft Viva Insights - Collabdays Bletchley Park 2023
Chirag Patel
•
17 views
Cloud Study Jam ppt.pptx
Poorabpatel
•
18 views
10 reasons to choose Galaxy Tab S9 for work on the go
Samsung Business USA
•
54 views
Product Research Presentation-Maidy Veloso.pptx
MaidyVeloso
•
17 views
"Architecture assessment from classics to details", Dmytro Ovcharenko
Fwdays
•
42 views
CamundaCon NYC 2023 Keynote - Shifting into overdrive with process orchestration
Bernd Ruecker
•
40 views
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Neo4j
•
33 views
Webhook Testing Strategy
Dimpy Adhikary
•
82 views
"Data Mesh in Kubernetes", Andrii Syniuk
Fwdays
•
16 views
Product Research Presentation-Maidy Veloso.pptx
MaidyVeloso
•
38 views
Take Control of Podcasting thanks to Open Source and Podcasting 2.0
🎙 Benjamin Bellamy
•
74 views
Common WordPress APIs_ Settings API
Jonathan Bossenger
•
29 views
The Top Outages of 2022: Analysis and Takeaways
1.
1 © 1992–2023 Cisco
Systems, Inc. All rights reserved.
2.
2 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Featured Speakers Chris Villemez Technical Marketing Engineer Brian Tobia Technical Marketing Engineer
3.
3 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Before We Begin... • If you have any questions, please type them in the Questions window. • If you have any audio problems, please chat us for help. • A recording of this presentation will be sent to you in a few days. 3 @ThousandEyes © 1992–2023 Cisco Systems, Inc. All rights reserved.
4.
4 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Agenda • About ThousandEyes • Noteworthy Outages of 2022 • Primer: Digital Service Building Blocks • Top Ten Outage Countdown • Lessons & Takeaways • Q&A 4 @ThousandEyes
5.
5 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Actionable Insight for Internet, Cloud, and SaaS Correlated Insights Quickly isolate issues to app, network, or service Network Visibility Overlay, hop-by-hop underlay, ISP performance, and BGP routing App Experience SaaS, API, and internal app performance and user experience
6.
6 © 1992–2023 Cisco
Systems, Inc. All rights reserved. 2022 Noteworthy Outages Major Significant Shadow British Airways (2/25) Twitter prefixes hijacked (3/28) Atlassian services unavailable (4/5) Rogers routing failure (7/8) AWS AZ Failure (8/9) Zoom Outage (9/15) Zscaler Internet Access Failure (10/25) WhatsApp Outage (10/25) AWS packet loss (12/5)
7.
7 © 1992–2023 Cisco
Systems, Inc. All rights reserved. CDN Cloud BGP DNS The Building Blocks of Today’s Digital Services SaaS
8.
8 © 1992–2023 Cisco
Systems, Inc. All rights reserved. DNS BGP Many Options, Complex Dependencies ISP Users CDN Your App Security
9.
9 © 1992–2023 Cisco
Systems, Inc. All rights reserved. DNS BGP Many Options, Complex Dependencies ISP Users CDN Your App Cloud APIs Data Center Cloud IaaS Security
10.
10 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 1: DNS – Where are We Going? Users CDN Your App BGP ISP DNS Root Server TLD Server Authoritative Server
11.
11 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 2: How do We Get There? Users BGP ISP DNS CDN Your App
12.
12 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 3: CDNs - Do We Have to Travel So Far? Users Your App CDN BGP ISP DNS
13.
13 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 4: Rinse and Repeat For Services & API Calls Your App SaaS Apps Cloud APIs Data Center Backend Services
14.
Top Ten Countdown
15.
15 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Atlassian, Apr 5, 2022 #9 #8 #10 #7 #6 Zscaler Internet Access, Oct 25, 2022 WhatsApp, Oct 25, 2022 AWS, Dec 5, 2022 Rogers, Jul 8, 2022 ~24 hours App + routing issues ~2.5 days Service unavailable/data loss Rogers withdrew its prefixes due to an internal routing issue, rendering it unreachable across the Internet for nearly 24 hours. Lesson: No provider is immune to outages. Plan for a backup network provider that can alleviate the length and scope of an outage. Customers using Zscaler Internet Access (ZIA) experienced connectivity failures or high latency in reaching Zscaler proxies. Lesson: Having network-agnostic data for complex scenarios like this can enable quicker attribution and remediation. ~30 minutes Network traffic loss ~2 hours Failure to send/receive messages ~1 hour Network traffic/packet loss Significant packet loss between 2 global locations and AWS' us- east-2 region. Lesson: it’s important to monitor not just the applications, but also the cloud infrastructure components and any dependent cloud software services. The two-hour outage left WhatsApp users unable to send or receive messages. Lesson: A thriving SaaS business relies on continuous improvement, which is why an immediate feedback loop—whereby mistakes can be rectified quickly—is necessary. Due to a maintenance script error, Atlassian services experienced a days-long outage. Lesson: One cannot rely on status pages alone to communicate about outages. Customers can be left worrying with no answer as to how serious an outage is and when it will be fixed. Outage Blog Outage Blog
16.
16 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Zoom, September 15th, 2022 #5 • Service unavailable ~20 minutes • Users were unable to log in or join meetings • Most of the HTTP errors seen were 503 Bad Gateway responses, indicative of potential CDN issues • The service would appear to be available if just testing via IP, but looking at HTTP results/service status tells a different story Lesson: It may be that the app itself is causing issues rather than the network. Having visibility into which it is can prevent confusion and finger-pointing during root cause analysis.
17.
17 © 1992–2023 Cisco
Systems, Inc. All rights reserved. British Airways, February 25, 2022 #4 • Service unavailable ~20 minutes • Outage caused hundreds of flight cancellations and disruptions in the airline's operations • Network paths to the airline’s online services (and servers) were reachable, but server and site responses were timing out Lesson: Architecting backends that avoid single points of failure can reduce the likelihood of a chain of events
18.
18 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Google, August 9, 2022 #3 • Service unavailable for ~60 minutes • Outage affected Google search and maps • During this time, Google web servers responded with HTTP 500 Internal Server Error messages, 502 bad gateway errors, and timeouts Lesson: It is important to monitor not just your application front ends but also the performance-critical dependencies that power your app. Outage Blog
19.
19 © 1992–2023 Cisco
Systems, Inc. All rights reserved. AWS AZ Failure, July 28th, 2022 #2 • Service unavailable ~20 minutes, ~3 hours for customers to recover • Caused by an Availability Zone power failure • Impacted applications such as Webex, Okta, and Splunk. • Affected EC2 instances and EBS volumes as well as traffic routing Lesson: Be sure to have redundant AZ architecture as they are typically active/active and remove the need to execute a backup plan. Outage Blog
20.
20 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Twitter, March 28th, 2022 #1 • Service unavailable ~45 minutes • Twitter was rendered unreachable for some users when JSC RTComm.RU (AS 8342) announced one of Twitter’s prefixes and subsequently blackholed traffic • Since Twitter’s service is not located within RTComm’s network, any Twitter traffic destined to RTComm would have failed. Lesson: Though your company might have RPKI implemented to fend off BGP threats, it's possible that your telco won't. Something to consider when selecting ISPs. Outage Blog
21.
21 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Lessons and Takeaways • BGP powers the Internet, but can also be misused and abused. Visibility and planning is needed to protect your network. • Public cloud is ubiquitous and reliable. But, ensure that you are monitoring all cloud dependencies. • Avoid single points of failure. Your apps are only as resilient as your architecture. • Security is essential, but it can add great complexity that requires continuous end-to-end visibility. • Whenever the infrastructure is touched, failures can occur. Visibility is critical before and after each network change to avoid impacts.
22.
© 1992–2023 Cisco
Systems, Inc. All rights reserved. 22 @ThousandEyes Learn more Free Trial / Demo Next Steps Copyright ©2023 ThousandEyes • Subscribe! https://blog.thousandeyes.com • Get a real-time view of the health of the Internet https://thousandeyes.com/outages • Sign up for a Free Trial: https://www.thousandeyes.com/signup • Request a demo: https://www.thousandeyes.com/request-demo
23.
Q&A