- ThousandEyes delivers network intelligence into every network through cloud, enterprise, and endpoint agents that provide visibility.
- It tackles challenges in hybrid network environments and provides end-to-end visibility through these different agent types located throughout the network.
- The solution also detects internet outages through analyzing aggregated anonymous traffic and routing data from across its global customer base to identify outage events, their scope and likely root causes.
2. 1
About ThousandEyes
Founded by network
experts; strong
investor backing
Relied on for
critical operations by
leading enterprises
Recognized as
an innovative
new approach
ThousandEyes delivers network intelligence into every network.
30 Fortune 500
5 top 5 SaaS Companies
4 top 6 US Banks
4. 3
Legacy Environments
NY Branch
HK Branch
Datacenter
• On-premises Apps
• Users in branch offices
over wired connections
• MPLS backbone
MPLS
MPLS
5. 4
Internet Centric Environment
• Adoption of Cloud
Applications
• Split-tunnel from
branch offices
• Direct Internet Connectivity
between branch offices
• Wireless becoming primary
connectivity at branch offices
• Remote Users accessing
cloud applications directly
NY Branch
HK Branch
Datacenter 0365
Internet
9. 8
Product Design Principles
Intuitive &
Effective UI
Harness the Power
of SaaS
Innovative Data
Collection & Analytics
• Powerful visualizations to
model complex data
• UI design that is re-usable
and scalable
• Seamless support help
• Minimal deployment effort
• Auto-updates
• Centralized configuration
• Cross-customer data
correlation and analysis
• Easy data sharing between
different customers
• Measure black-box
environments using
active probing
• Measure with minimum
instrumentation
10. 9
• Tackling Hybrid Network Environments with Enterprise
Agents
– Nick Kephart
• End to End Visibility with Endpoint Agent
– Scott Cressman, Martin Dam
• Internet Outage Detection
– Ricardo Oliveira
Rest of the Day
12. 11
Enterprise Agent: Internal Vantage Point
Key Use Cases
• Internet connectivity of
ISP ingress and egress
• WAN visibility between
branches and data
centers
• Performance of web,
voice and FTP
application traffic
NY Branch
HK Branch
Datacenter 0365
Internet
13. 12
Deploying Enterprise Agents
• Locations with containerized
monitoring and operations tools
• For remote branches and stores with
limited IT infrastructure
• Branch and WAN routers (IOS XE
3.17+ on ASR 1000 and ISR 4000)
New
New
New
Virtual Appliance
Docker Container
Linux Package
Intel NUC Installer
Cisco IOS
Virtual Container
• Easily deployable across the
enterprise WAN and data center
14. 13
Visualizing the Entire Network Path
Highlights
• Forward and reverse
path (helpful for
asymmetric routing)
• Measure and locate
changes in loss, latency
and QoS in each
direction
• Also test UDP in addition
to TCP
16. 15
End User Visibility Challenges
• Remote and traveling
workers
• SaaS deployments
• LAN and WAN issues
in satellite offices
NY Branch
HK Branch
Datacenter 0365
Internet
18. 17
Enter ThousandEyes Endpoint Agent
You can’t get this from any other monitoring solution, period.
• Extends visibility to the end-
user, in the office, at home,
on-the-go
• Troubleshoot individual user
sessions with live
performance data
• Analyze trends across user
populations, applications,
geographies
19. 18
How Endpoint Agent Works
Lightweight client software
Windows 7+, Mac OS X 10.9+
Negligible resource consumption
Typically <1% CPU, <40MB mem, <50MB disk
Easy deployment via standard tools
msi & pkg installers w/ auto-registration
End-user & background components
Browser plugin (Chrome & IE) & system service
Always up-to-date
Updates automatically, runs in the background
WEB/APPLICATION
Completion, availability, response time, page
load waterfall
NETWORK
Loss, latency, jitter, failures, path visualization,
wireless topology, VPN, proxy, Wi-Fi quality
(live user sessions!)
Browser-based web applications • Only collects data for domains you choose to monitor
Data streamed instantly to ThousandEyes service
22. 21
The Problem Landscape
• Lack of visibility to apps
relying on the Internet
{UC,S,I,P}aaS
• Lack of visibility to
wireless/remote/mobile users
• Traditional NPM solutions
design for static clients and
on-prem apps
– Packet capture
– SNMP polling
NY Branch
HK Branch
Datacenter 0365
Internet
24. 23
• Internet is a shared network – same event impacts
multiple customers
• Harness data from multiple customers for more
accurate inference of problem
• Drive more value to customers with knowledge of
depth and breadth of problem
Drive for Internet Outage Detection
25. 24
• Detect outages in ISPs and
understand their impact both
globally and as it relates to a
specific customer
Overview: Internet Outage Detection
• See the global and account
scope, as well as likely root
cause of BGP reachability
outages
Traffic Outage Detection Routing Outage Detection
26. 25
1. Anonymized (http) traffic data is aggregated from all tests across the entire user base
2. Algorithms then look for patterns in path traces terminating in the same ISP
3. Exclude: noisy interfaces and networks not belonging to ISPs
How Traffic Outage Detection Works
New York
Cloud Agent
Boston
Enterprise Agent
Los Angeles
Cloud Agent
Level 3 in San Jose
Cogent in Denver
Salesforce
Google
NY Times
Customer 2
Customer 1
30. 29
• ~ 1.6k prefixes affected / hour
Routing Outages All the Time
31. 30
Hurricane Electric route leak affecting AWS
Trans-Atlantic issues in Level 3
– https://blog.thousandeyes.com/trans-atlantic-issues-level-3-network/
Tata and TISparkle issues with submarine cable
– https://blog.thousandeyes.com/smw-4-cable-fault-ripple-effects-across-networks/
Hurricane Electric removed >500 prefixes
Tata cable cut in Singapore affecting Dropbox
Level 3, NTT routing issues affecting JIRA
– https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/
Widespread issues in Telia’s network in Ashburn
– https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/
Recent Major Outages Detected
April 23
May 3
May 20
June 6
June 24
July 10
July 17
33. 32
1. Network Layer Issues in Telia in Ashburn
Detected outage
coincides with
packet loss
spikes
Ashburn, VA is
“ground zero”
for this outage
https://fvqmu.share.thousandeyes.com/
34. 33
Specific Failure Points in Telia
High severity and wide scope
(Outages affecting at least 20 tests
for a NA/EU interface are likely to
be wide in scope)
Terminal
nodes in Telia
35. 34
2. Hurricane Electric Route Flap
Detected outage
coincides with
spike in AS path
changes
Root cause
analysis points to
Hurricane Electric
and Telx
https://njjgkif.share.thousandeyes.com/
36. 35
Route Flap by Hurricane Electric
Hurricane Electric
Routes flap from
using HE to NTT,
then back to HE
38. 37
3. NTT and Level 3 Routing Issues Affect JIRA
JIRA saw 0% availability
and 100% packet loss
Most affected
interfaces are in
Ashburn, VA
https://ncigwwph.share.thousandeyes.com/
39. 38
Traffic Terminating in NTT
Traffic paths originally
traversed Level 3 and NTT
Traffic paths then change
to traverse only NTT,
terminating there
40. 39
JIRA’s /24 Prefix Becomes Unreachable
As the primary upstream
ISP, Level 3 is associated
with the most affected routes
Routes through upstream ISPs
NTT and Level 3 all withdrawn
41. 40
Routers Begin Using Misconfigured /16 Prefix
The backup /16 prefix
directs to NTT, not JIRA’s
network. This is why the
traffic path changed to
traverse only NTT,
terminating there when
JIRA’s IP couldn’t be
found in NTT’s network.
42. 41
Traffic Outages @ Cloud
• IaaS/PaaS (CDNs, hosting, DNS providers)
• SaaS (+ app context)
Routing Outages
• Leaks and hijacks
Outage Event Stream
• Outage geo + topology maps
• Alerts based on outage impact/location/type/etc
What’s Next
44. 43
• Look for purple indicators and the ‘Outage Detected’ dropdown when
investigating issues—these indicate detected outages!
• Use quick links or select specific nodes/ASes to see how paths have
changed over time
• Correlate data from the web, network and routing layers to analyze
root cause
• See our blogs and Knowledge Base articles for more info:
– Blog on Traffic Outage Detection
– https://blog.thousandeyes.com/analyzing-internet-issues-traffic-outage-detection/
– Blog on Routing Outage Detection
– https://blog.thousandeyes.com/identifying-root-cause-routing-outage-detection/
– Knowledge Base: https://support.thousandeyes.com/entries/110214366
Tips for Diagnosing Internet Outages