SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Disaster Porn
... and the importance of being a generalist
About Me

       Scott Sanders
Senior Systems Administrator
      RideCharge, Inc.

     @scott_sanders
 ssanders@taximagic.com
Surge Conference 2011
● Ben Fried's (Google CIO) keynote speech
  talks about the importance being a generalist
● I think specializing is fine (and normal as
  your career advances), but it's VITAL to
  keep a generalist perspective
● Disaster porn!
● I have no affiliation with OmniTI or Surge,
  but I highly recommend you attend the
  conference in Baltimore on Sept. 27th - 28th
Background
● Taxi Magic
  ○ Mobile applications to book/track/pay for taxis
  ○ Web booking integration for taxi fleets
  ○ In-car payment hardware (PIM)
● What's a PIM?
  ○   Passenger Information Monitor
  ○   7" HD touchscreen
  ○   Credit card swipe
  ○   Wired into cab hardware and dispatch system
  ○   Uses cellular communication to talk to TM
  ○   Regular GPS events over UDP
  ○   Payment transactions over HTTPS
The problems begin...              (June 5th)
● A handful of cab drivers in Los Angeles
  begin reporting failures when swiping CCs
● Embedded hardware team recalls a few
  cabs and investigates local log files
● Reports problems during SSL handshake to
  RideCharge servers
● Tech Ops team remaps httpd to the same
  libcrypto.so and libssl.so version as the PIM
  using libmap.conf(5)
● Problem vanishes! HOORAY!!! Beer!
Fast forward to June 12th...
● SHTF
● Widespread reports of failing CC swipes
  across the entire SoCal region
● Hardware team pulls more vehicles and
  notices the same SSL handshake problem
● Tech Ops team is unable to correlate this to
  a drop in traffic
● Furthermore, Tech Ops is still seeing regular
  GPS updates from ALL active cabs!
WTF?
Diving in...
● Our cellular ISP insists they aren't having
  any problems
● (Sound familiar to anyone?)
● I start running the standard toolkit looking for
  patterns
   ○ tcpdump
   ○ traceroute
   ○ NMAP
● NMAP is giving me some inconsistent
  results
Understanding how TCP/IP works
● How do you establish a TCP connection?
  ○ SYN         (Hey, you there?)
  ○ SYN/ACK     (Yeah, what's up?)
  ○ ACK         (Cool, lets talk!)
● What happens if you connect to a port that
  doesn't have a service bound to it?
  ○ SYN   (Hey, you there?)
  ○ RST   (leave me alone!)
● So why am I only getting a RST every now
  and then? Why do I see timeouts instead?
● This is starting to smell like a routing
  problem
Proving the problem exists
● Since I am receiving GPS updates over UDP
  from all the cabs I can use this to identify the
  IP of a cab and its location at a point in time
● We know the expected behavior when
  attempting a connection to a closed port
● Let's run some tests and gather some data
comm_test.sh
#!/usr/bin/env bash

test_connection () {
  # fork a subshell to handle the tcp connect test
  ( # the result is either no-response or conn-refused
      result=$(nmap -P0 -T1 -sT -p22 --reason -q $4 | awk '/^22/{print $4}')
      echo "$1 $2 $3 $4 $result $8 $9" >> results.txt
  ) &
}

# connect to the gps receiver host and monitor real-time UDP gps updates
ssh -t gps001.iad1.prod.rws 'tail -F gps_updates.csv' | while read line ; do
  # line format: Jun 16 15:14:45, 184.251.233.91, 0, 20, 2577, 
  # 33.9822566666667, -118.4593
  line=$(echo $line | tr -d ',')
  test_connection $line
done
Results
% comm_test.sh
Jun 16 15:28:00   102.122.93.194 conn-refused 33.8221321105957 -116.548851013184
Jun 16 15:27:57   176.135.73.0 conn-refused 32.8885866666667 -97.0376933333333
Jun 16 15:27:59   181.251.163.200 conn-refused 33.9004183333333 -118.387591666667
Jun 16 15:27:53   178.156.201.182 conn-refused 44.9484977722168 -93.2568588256836
Jun 16 15:27:28   180.229.138.141 no-response 39.766675 -104.940496666667
Jun 16 15:27:28   187.231.74.250 no-response 33.80945 -118.206921666667
Jun 16 15:28:00   181.255.84.59 conn-refused 34.0593466666667 -118.24536
Jun 16 15:27:55   78.6.67.236 conn-refused 34.0581833333333 -118.415878333333
Visualize the problem
Awesome way to get non-techie's on your
 side and impress some management :-)
Beating up your ISP (figuratively)
● After more than a dozen calls to the ISP and
  as many "escalations" we landed on a
  conference call with some lead networks
  engineers
● After 6 hours on this conference call
  reiterating the problem and showing the data
  one engineer asks us to "hold tight"
● Things get very quiet...
● Like magic all of my tests start succeeding!
WTF!?!
The backstory
● On June 5th, the ISP migrated the SoCal
  region to a new datacenter in Anaheim. This
  was an epic failure and they rolled back
● On June 12th, the ISP migrated again to
  Anaheim "successfully"
● Cell traffic is pooled by connection, and one
  of the pools was routing asymmetrically
● Asymmetric routing + stateful firewalls =
  BAD
● Updating the routing tables fixed everything
Being a generalist
● A DevOps culture requires generalists
● Understanding the full stack means being
  able to troubleshoot problems at all layers
● Fluid communication between sysadmins,
  developers, hardware engineers, and
  network engineers requires generalists
● Fewer people in the war room results in
  faster problem solving
● This saves time and money and makes your
  team more valuable to the business
Thank you!
 We're hiring!

Weitere ähnliche Inhalte

Ähnlich wie Disaster porn and the value of a generalist

Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
NYversity
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a country
Tiago Henriques
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 

Ähnlich wie Disaster porn and the value of a generalist (20)

Detecting Spoofing at IXPs
Detecting Spoofing at IXPsDetecting Spoofing at IXPs
Detecting Spoofing at IXPs
 
Detecting spoofing at IxP's
Detecting spoofing at IxP'sDetecting spoofing at IxP's
Detecting spoofing at IxP's
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
 
Routing Security Considerations
Routing Security ConsiderationsRouting Security Considerations
Routing Security Considerations
 
Routing, Network Performance, and Role of Analytics
Routing, Network Performance, and Role of AnalyticsRouting, Network Performance, and Role of Analytics
Routing, Network Performance, and Role of Analytics
 
Do You Need a Service Mesh? @ London Devops, January 2019
Do You Need a Service Mesh? @ London Devops, January 2019Do You Need a Service Mesh? @ London Devops, January 2019
Do You Need a Service Mesh? @ London Devops, January 2019
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a country
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
WebRTC: A front-end perspective
WebRTC: A front-end perspectiveWebRTC: A front-end perspective
WebRTC: A front-end perspective
 
Initial Experiences Route Filtering at the Edge AS15169 by Arturo L. Servin
Initial Experiences Route Filtering at the Edge AS15169 by Arturo L. ServinInitial Experiences Route Filtering at the Edge AS15169 by Arturo L. Servin
Initial Experiences Route Filtering at the Edge AS15169 by Arturo L. Servin
 
Simplified Troubleshooting through API Scripting
Simplified Troubleshooting through API Scripting Simplified Troubleshooting through API Scripting
Simplified Troubleshooting through API Scripting
 
Validating big data pipelines - FOSDEM 2019
Validating big data pipelines -  FOSDEM 2019Validating big data pipelines -  FOSDEM 2019
Validating big data pipelines - FOSDEM 2019
 
What is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your MicroservicesWhat is a Service Mesh and what can it do for your Microservices
What is a Service Mesh and what can it do for your Microservices
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Network Automation - Interconnection tools
Network Automation - Interconnection toolsNetwork Automation - Interconnection tools
Network Automation - Interconnection tools
 
Enei
EneiEnei
Enei
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Disaster porn and the value of a generalist

  • 1. Disaster Porn ... and the importance of being a generalist
  • 2. About Me Scott Sanders Senior Systems Administrator RideCharge, Inc. @scott_sanders ssanders@taximagic.com
  • 3. Surge Conference 2011 ● Ben Fried's (Google CIO) keynote speech talks about the importance being a generalist ● I think specializing is fine (and normal as your career advances), but it's VITAL to keep a generalist perspective ● Disaster porn! ● I have no affiliation with OmniTI or Surge, but I highly recommend you attend the conference in Baltimore on Sept. 27th - 28th
  • 4. Background ● Taxi Magic ○ Mobile applications to book/track/pay for taxis ○ Web booking integration for taxi fleets ○ In-car payment hardware (PIM) ● What's a PIM? ○ Passenger Information Monitor ○ 7" HD touchscreen ○ Credit card swipe ○ Wired into cab hardware and dispatch system ○ Uses cellular communication to talk to TM ○ Regular GPS events over UDP ○ Payment transactions over HTTPS
  • 5. The problems begin... (June 5th) ● A handful of cab drivers in Los Angeles begin reporting failures when swiping CCs ● Embedded hardware team recalls a few cabs and investigates local log files ● Reports problems during SSL handshake to RideCharge servers ● Tech Ops team remaps httpd to the same libcrypto.so and libssl.so version as the PIM using libmap.conf(5) ● Problem vanishes! HOORAY!!! Beer!
  • 6. Fast forward to June 12th... ● SHTF ● Widespread reports of failing CC swipes across the entire SoCal region ● Hardware team pulls more vehicles and notices the same SSL handshake problem ● Tech Ops team is unable to correlate this to a drop in traffic ● Furthermore, Tech Ops is still seeing regular GPS updates from ALL active cabs!
  • 8. Diving in... ● Our cellular ISP insists they aren't having any problems ● (Sound familiar to anyone?) ● I start running the standard toolkit looking for patterns ○ tcpdump ○ traceroute ○ NMAP ● NMAP is giving me some inconsistent results
  • 9. Understanding how TCP/IP works ● How do you establish a TCP connection? ○ SYN (Hey, you there?) ○ SYN/ACK (Yeah, what's up?) ○ ACK (Cool, lets talk!) ● What happens if you connect to a port that doesn't have a service bound to it? ○ SYN (Hey, you there?) ○ RST (leave me alone!) ● So why am I only getting a RST every now and then? Why do I see timeouts instead? ● This is starting to smell like a routing problem
  • 10. Proving the problem exists ● Since I am receiving GPS updates over UDP from all the cabs I can use this to identify the IP of a cab and its location at a point in time ● We know the expected behavior when attempting a connection to a closed port ● Let's run some tests and gather some data
  • 11. comm_test.sh #!/usr/bin/env bash test_connection () { # fork a subshell to handle the tcp connect test ( # the result is either no-response or conn-refused result=$(nmap -P0 -T1 -sT -p22 --reason -q $4 | awk '/^22/{print $4}') echo "$1 $2 $3 $4 $result $8 $9" >> results.txt ) & } # connect to the gps receiver host and monitor real-time UDP gps updates ssh -t gps001.iad1.prod.rws 'tail -F gps_updates.csv' | while read line ; do # line format: Jun 16 15:14:45, 184.251.233.91, 0, 20, 2577, # 33.9822566666667, -118.4593 line=$(echo $line | tr -d ',') test_connection $line done
  • 12. Results % comm_test.sh Jun 16 15:28:00 102.122.93.194 conn-refused 33.8221321105957 -116.548851013184 Jun 16 15:27:57 176.135.73.0 conn-refused 32.8885866666667 -97.0376933333333 Jun 16 15:27:59 181.251.163.200 conn-refused 33.9004183333333 -118.387591666667 Jun 16 15:27:53 178.156.201.182 conn-refused 44.9484977722168 -93.2568588256836 Jun 16 15:27:28 180.229.138.141 no-response 39.766675 -104.940496666667 Jun 16 15:27:28 187.231.74.250 no-response 33.80945 -118.206921666667 Jun 16 15:28:00 181.255.84.59 conn-refused 34.0593466666667 -118.24536 Jun 16 15:27:55 78.6.67.236 conn-refused 34.0581833333333 -118.415878333333
  • 13. Visualize the problem Awesome way to get non-techie's on your side and impress some management :-)
  • 14.
  • 15.
  • 16. Beating up your ISP (figuratively) ● After more than a dozen calls to the ISP and as many "escalations" we landed on a conference call with some lead networks engineers ● After 6 hours on this conference call reiterating the problem and showing the data one engineer asks us to "hold tight" ● Things get very quiet... ● Like magic all of my tests start succeeding!
  • 18. The backstory ● On June 5th, the ISP migrated the SoCal region to a new datacenter in Anaheim. This was an epic failure and they rolled back ● On June 12th, the ISP migrated again to Anaheim "successfully" ● Cell traffic is pooled by connection, and one of the pools was routing asymmetrically ● Asymmetric routing + stateful firewalls = BAD ● Updating the routing tables fixed everything
  • 19. Being a generalist ● A DevOps culture requires generalists ● Understanding the full stack means being able to troubleshoot problems at all layers ● Fluid communication between sysadmins, developers, hardware engineers, and network engineers requires generalists ● Fewer people in the war room results in faster problem solving ● This saves time and money and makes your team more valuable to the business
  • 20. Thank you! We're hiring!