SlideShare a Scribd company logo
Suche senden
Hochladen
The Top Outages of 2022: Analysis and Takeaways
Melden
ThousandEyes
ThousandEyes
Folgen
•
0 gefällt mir
•
45 views
1
von
24
The Top Outages of 2022: Analysis and Takeaways
•
0 gefällt mir
•
45 views
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Melden
Technologie
Presented by Mike Hicks
Mehr lesen
ThousandEyes
ThousandEyes
Folgen
Recomendados
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
ThousandEyes
64 views
•
24 Folien
Introduction to ThousandEyes
ThousandEyes
89 views
•
35 Folien
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edition
ThousandEyes
39 views
•
29 Folien
Introduction to ThousandEyes
ThousandEyes
100 views
•
38 Folien
Takeaways, Lessons, and Insights From the Cloud Performance Report: 2022 Edit...
ThousandEyes
63 views
•
29 Folien
Introduction to ThousandEyes
ThousandEyes
170 views
•
39 Folien
Más contenido relacionado
Similar a The Top Outages of 2022: Analysis and Takeaways
Introduction To ThousandEyes
ThousandEyes
170 views
•
37 Folien
0328apjcintrotothousandeyeswebinar-230328233735-4df10d7f.pdf
Saurabh Chauhan
10 views
•
37 Folien
Introduction to ThousandEyes
ThousandEyes
125 views
•
34 Folien
Introduction to ThousandEyes
ThousandEyes
616 views
•
36 Folien
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
ThousandEyes
122 views
•
30 Folien
Microsoft Outage Analysis
ThousandEyes
414 views
•
14 Folien
Similar a The Top Outages of 2022: Analysis and Takeaways
(20)
Introduction To ThousandEyes
ThousandEyes
•
170 views
0328apjcintrotothousandeyeswebinar-230328233735-4df10d7f.pdf
Saurabh Chauhan
•
10 views
Introduction to ThousandEyes
ThousandEyes
•
125 views
Introduction to ThousandEyes
ThousandEyes
•
616 views
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
ThousandEyes
•
122 views
Microsoft Outage Analysis
ThousandEyes
•
414 views
Introduction to ThousandEyes
ThousandEyes
•
69 views
How to Evaluate, Rollout and Operationalize Your SD-WAN Projects
ThousandEyes
•
231 views
What is ThousandEyes Webinar
ThousandEyes
•
62 views
Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
ThousandEyes
•
25 views
EMEA Optimizing and Troubleshooting Digital Experience for a Hybrid Workforce
ThousandEyes
•
67 views
Owning End-to-end Application Experience With ThousandEyes
ThousandEyes
•
146 views
Level-up Your Cloud Visibility Into AWS With ThousandEyes
ThousandEyes
•
72 views
Adopting SD-WAN With Confidence: How To Assure and Troubleshoot Internet-base...
ThousandEyes
•
237 views
Cisco IT and ThousandEyes
ThousandEyes
•
5.6K views
The Top Outages of 2021: Analysis and Takeaways
ThousandEyes
•
691 views
Discover the Power of ThousandEyes on Your Meraki MX
ThousandEyes
•
22 views
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
•
42 views
Getting Started with ThousandEyes Proof of Concepts
ThousandEyes
•
135 views
Realize True Business Value With ThousandEyes
ThousandEyes
•
56 views
Más de ThousandEyes
Level-up Your Cloud Visibility Into AWS With ThousandEyes
ThousandEyes
88 views
•
33 Folien
How Financial Institutions Can Deliver Seamless Customer Digital Engagements
ThousandEyes
53 views
•
27 Folien
New ThousandEyes Product Features and Release Highlights: November 2023
ThousandEyes
33 views
•
32 Folien
New ThousandEyes Product Features and Release Highlights: October 2023
ThousandEyes
71 views
•
32 Folien
Introduction to ThousandEyes and Meraki MX for Partners
ThousandEyes
25 views
•
14 Folien
Introduction to ThousandEyes and Meraki MX for Partners in Spanish
ThousandEyes
27 views
•
15 Folien
Más de ThousandEyes
(20)
Level-up Your Cloud Visibility Into AWS With ThousandEyes
ThousandEyes
•
88 views
How Financial Institutions Can Deliver Seamless Customer Digital Engagements
ThousandEyes
•
53 views
New ThousandEyes Product Features and Release Highlights: November 2023
ThousandEyes
•
33 views
New ThousandEyes Product Features and Release Highlights: October 2023
ThousandEyes
•
71 views
Introduction to ThousandEyes and Meraki MX for Partners
ThousandEyes
•
25 views
Introduction to ThousandEyes and Meraki MX for Partners in Spanish
ThousandEyes
•
27 views
Introduction to ThousandEyes and Meraki MX for Partners in French
ThousandEyes
•
50 views
Introduction to ThousandEyes and Meraki MX for Partners in German.pptx
ThousandEyes
•
24 views
New ThousandEyes Product Features and Release Highlights: October 2023
ThousandEyes
•
105 views
roomos_webinar_280923_v2.pptx
ThousandEyes
•
71 views
Improving Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft ...
ThousandEyes
•
87 views
Improve Employee Experiences on Cisco RoomOS Devices, Webex, and Microsoft Te...
ThousandEyes
•
106 views
New ThousandEyes Product Features and Release Highlights: July 2023
ThousandEyes
•
49 views
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
•
29 views
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
•
5 views
New ThousandEyes Product Features and Release Highlights: June 2023
ThousandEyes
•
62 views
A Partner Overview to ThousandEyes - v1_1_ES.pptx
ThousandEyes
•
34 views
A Partner Overview to ThousandEyes - v1_2_DE.pptx
ThousandEyes
•
34 views
How to Monitor Digital Dependencies Across Your Modern IT Stack
ThousandEyes
•
58 views
emea_cisco_live_webinar_150623.pptx
ThousandEyes
•
216 views
Último
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang
34 views
•
1 Folie
.conf Go 2023 - Data analysis as a routine
Splunk
85 views
•
12 Folien
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays
39 views
•
31 Folien
ChatGPT and AI for Web Developers
Maximiliano Firtman
161 views
•
82 Folien
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...
The Digital Insurer
26 views
•
18 Folien
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver
24 views
•
39 Folien
Último
(20)
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang
•
34 views
.conf Go 2023 - Data analysis as a routine
Splunk
•
85 views
"Thriving Culture in a Product Company — Practical Story", Volodymyr Tsukur
Fwdays
•
39 views
ChatGPT and AI for Web Developers
Maximiliano Firtman
•
161 views
Webinar : Competing for tomorrow’s leaders – How MENA insurers can win the wa...
The Digital Insurer
•
26 views
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver
•
24 views
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk
•
82 views
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM
CXL Forum
•
101 views
PyCon ID 2023 - Ridwan Fadjar Septian.pdf
Ridwan Fadjar
•
165 views
Microchip: CXL Use Cases and Enabling Ecosystem
CXL Forum
•
118 views
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst
•
422 views
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays
•
25 views
Throughput
Moisés Armani Ramírez
•
31 views
Data-centric AI and the convergence of data and model engineering:opportunit...
Paolo Missier
•
25 views
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
Splunk
•
188 views
Photowave Presentation Slides - 11.8.23.pptx
CXL Forum
•
120 views
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays
•
38 views
TE Connectivity: Card Edge Interconnects
CXL Forum
•
95 views
Java 21 and Beyond- A Roadmap of Innovations .pdf
Ana-Maria Mihalceanu
•
54 views
Micron CXL product and architecture update
CXL Forum
•
23 views
The Top Outages of 2022: Analysis and Takeaways
1.
1 © 1992–2023 Cisco
Systems, Inc. All rights reserved.
2.
2 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Featured Speaker Mike Hicks Principal Solutions Analyst
3.
3 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Before We Begin... • If you have any questions, please type them in the Questions window. • If you have any audio problems, please chat us for help. • A recording of this presentation will be sent to you in a few days. 3 @ThousandEyes © 1992–2023 Cisco Systems, Inc. All rights reserved.
4.
4 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Agenda • About ThousandEyes • Noteworthy Outages of 2022 • Primer: Digital Service Building Blocks • Top Ten Outage Countdown • Lessons & Takeaways • Q&A 4 @ThousandEyes
5.
5 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Actionable Insight for Internet, Cloud, and SaaS Correlated Insights Quickly isolate issues to app, network, or service Network Visibility Overlay, hop-by-hop underlay, ISP performance, and BGP routing App Experience SaaS, API, and internal app performance and user experience
6.
6 © 1992–2023 Cisco
Systems, Inc. All rights reserved. 2022 Noteworthy Outages Major Significant Shadow British Airways (2/25) Twitter prefixes hijacked (3/28) Atlassian services unavailable (4/5) Rogers routing failure (7/8) AWS AZ Failure (8/9) Zoom Outage (9/15) Zscaler Internet Access Failure (10/25) WhatsApp Outage (10/25) AWS packet loss (12/5)
7.
7 © 1992–2023 Cisco
Systems, Inc. All rights reserved. CDN Cloud BGP DNS The Building Blocks of Today’s Digital Services SaaS
8.
8 © 1992–2023 Cisco
Systems, Inc. All rights reserved. DNS BGP Many Options, Complex Dependencies ISP Users CDN Your App Security
9.
9 © 1992–2023 Cisco
Systems, Inc. All rights reserved. DNS BGP Many Options, Complex Dependencies ISP Users CDN Your App Cloud APIs Data Center Cloud IaaS Security
10.
10 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 1: DNS – Where are We Going? Users CDN Your App BGP ISP DNS Root Server TLD Server Authoritative Server
11.
11 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 2: How do We Get There? Users BGP ISP DNS CDN Your App
12.
12 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 3: CDNs - Do We Have to Travel So Far? Users Your App CDN BGP ISP DNS
13.
13 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Step 4: Rinse and Repeat For Services & API Calls Your App SaaS Apps Cloud APIs Data Center Backend Services
14.
Top Ten Countdown
15.
15 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Atlassian, Apr 5, 2022 #9 #8 #10 #7 #6 Zscaler Internet Access, Oct 25, 2022 WhatsApp, Oct 25, 2022 AWS, Dec 5, 2022 Rogers, Jul 8, 2022 ~24 hours App + routing issues ~2.5 days Service unavailable/data loss Rogers withdrew its prefixes due to an internal routing issue, rendering it unreachable across the Internet for nearly 24 hours. Lesson: No provider is immune to outages. Plan for a backup network provider that can alleviate the length and scope of an outage. Customers using Zscaler Internet Access (ZIA) experienced connectivity failures or high latency in reaching Zscaler proxies. Lesson: Having network-agnostic data for complex scenarios like this can enable quicker attribution and remediation. ~30 minutes Network traffic loss ~2 hours Failure to send/receive messages ~1 hour Network traffic/packet loss Significant packet loss between 2 global locations and AWS' us- east-2 region. Lesson: it’s important to monitor not just the applications, but also the cloud infrastructure components and any dependent cloud software services. The two-hour outage left WhatsApp users unable to send or receive messages. Lesson: A thriving SaaS business relies on continuous improvement, which is why an immediate feedback loop—whereby mistakes can be rectified quickly—is necessary. Due to a maintenance script error, Atlassian services experienced a days-long outage. Lesson: One cannot rely on status pages alone to communicate about outages. Customers can be left worrying with no answer as to how serious an outage is and when it will be fixed. Outage Blog Outage Blog
16.
16 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Zoom, September 15th, 2022 #5 • Service unavailable ~20 minutes • Users were unable to log in or join meetings • Most of the HTTP errors seen were 503 Bad Gateway responses, indicative of potential CDN issues • The service would appear to be available if just testing via IP, but looking at HTTP results/service status tells a different story Lesson: It may be that the app itself is causing issues rather than the network. Having visibility into which it is can prevent confusion and finger-pointing during root cause analysis.
17.
17 © 1992–2023 Cisco
Systems, Inc. All rights reserved. British Airways, February 25, 2022 #4 • Service unavailable ~20 minutes • Outage caused hundreds of flight cancellations and disruptions in the airline's operations • Network paths to the airline’s online services (and servers) were reachable, but server and site responses were timing out Lesson: Architecting backends that avoid single points of failure can reduce the likelihood of a chain of events
18.
18 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Google, August 9, 2022 #3 • Service unavailable for ~60 minutes • Outage affected Google search and maps • During this time, Google web servers responded with HTTP 500 Internal Server Error messages, 502 bad gateway errors, and timeouts Lesson: It is important to monitor not just your application front ends but also the performance-critical dependencies that power your app. Outage Blog
19.
19 © 1992–2023 Cisco
Systems, Inc. All rights reserved. AWS AZ Failure, July 28th, 2022 #2 • Service unavailable ~20 minutes, ~3 hours for customers to recover • Caused by an Availability Zone power failure • Impacted applications such as Webex, Okta, and Splunk. • Affected EC2 instances and EBS volumes as well as traffic routing Lesson: Be sure to have redundant AZ architecture as they are typically active/active and remove the need to execute a backup plan. Outage Blog
20.
20 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Twitter, March 28th, 2022 #1 • Service unavailable ~45 minutes • Twitter was rendered unreachable for some users when JSC RTComm.RU (AS 8342) announced one of Twitter’s prefixes and subsequently blackholed traffic • Since Twitter’s service is not located within RTComm’s network, any Twitter traffic destined to RTComm would have failed. Lesson: Though your company might have RPKI implemented to fend off BGP threats, it's possible that your telco won't. Something to consider when selecting ISPs. Outage Blog
21.
21 © 1992–2023 Cisco
Systems, Inc. All rights reserved. Lessons and Takeaways • BGP powers the Internet, but can also be misused and abused. Visibility and planning is needed to protect your network. • Public cloud is ubiquitous and reliable. But, ensure that you are monitoring all cloud dependencies. • Avoid single points of failure. Your apps are only as resilient as your architecture. • Security is essential, but it can add great complexity that requires continuous end-to-end visibility. • Whenever the infrastructure is touched, failures can occur. Visibility is critical before and after each network change to avoid impacts.
22.
© 1992–2023 Cisco
Systems, Inc. All rights reserved. 22 @ThousandEyes Learn more Free Trial / Demo Next Steps Copyright ©2023 ThousandEyes • Subscribe! https://blog.thousandeyes.com • Get a real-time view of the health of the Internet https://thousandeyes.com/outages • Sign up for a Free Trial: https://www.thousandeyes.com/signup • Request a demo: https://www.thousandeyes.com/request-demo
23.
Q&A