SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Sagi Brody, CTO
@webairsagi
Troubleshooting:
A High Value Asset for The Service Provider
Discipline
Background
• 17+ Years experience as a service provider.
• MSP - We own your infrastructure stack.
• Managed Public, Private, and Hybrid Cloud.
• High Touch - Monitoring, Managing, Securing
all layers (app/db/fw/nw/cdn)
• Mix of Open Source, Commercial, and
Proprietary Software.
• Context of an MSP but applicable to all IT.
The Art of Troubleshooting
• Troubleshooting is a standalone skill - Must be
Taught, Trained, Documented, Reviewed.
• Technical skills are important but soft skills
cannot be forgotten.
• Big buzz around hyped technologies in
DevOps.
• Other important soft skills - Resourcefulness,
Communication, Technical Documentation,
Mentoring.
Why Care?
• We're already doing it, internally or customer facing.
• We're trusted to do it well.
• We’re judged how we act during crisis.
• Technology agnostic skill. Never become obsolete and
only becomes more important as systems become more
complex.
• Applicable to infrastructure & software development.
• Allows us to scale - It's how we manage many complex
environments with small teams.
• Reduce downtime & Save lives!!
4
Brief History
• Hardware & Software
Systems traditionally
monolithic in nature.
• Flat topology without
abstraction.
• Mostly physical
infrastructure.
• Perpetual
configurations .
• No automation.
• Easier Troubleshooting.
5
Today
• Distributed, virtualized, and
abstracted infrastructure -
Network, Storage, Compute.
• Self-healing & Autoscaling
• Microservices based architecture
• Increased complexity with many
benefits (scaling, CI,time).
• Decreased visibility, control, and
ease of Troubleshooting.
• Troubleshooting == An Art.
6
Where to Start?
• Fast resolution starts with absolute
understanding of the issue.
• Is this the cause or a symptom?
• Was it brought to us pre-diagnosed?
• What is the context?
• Why is this a problem? Why is it important?
• No tunnel vision. Look at the big picture.
• Examples: 'Email Issue' , 'Wifi/Speed' , 'DDoS &
API'
7
Observe
• Can you have a clear understanding of the issue
without seeing it for yourself?
• Before attempting resolution, be sure you can
reproduce.
• Gain perspective using tools - remote logins,
screen shares, screen shots.
• Understand expected behavior vs fault.
• Software development & Debugging tools.
• Supplement lack of perspective with solid
communication.
8
Localize
• Drill Down - Peel back the layers of the onion.
• Process of elimination to localize the issue.
• Examples: static content vs dynamic content, local
storage vs shared/network storage, directly test
and bypass layers, rule out network
• Follow the flow of data and test each step.
• Major Outages - localize to commonality - What do
the effected services have in common?
• Use monitoring tools or SPOTs to localize.
• Don't assume anything - NIH (Not invented here).
9
Resolve and Test
• Scientific Method: Question -> Hypothesis –>
Experiment -> Observation ->Analysis ->
Conclusion
• Make small incremental changes.
• Use sandboxes where possible.
• Document each test and result, and change.
• Use ability to reproduce and observe to confirm
resolution.
10
Monitor for Success
• Fast resolution starts with a solid monitoring strategy.
• Empower DevOps to create layered alerts.
• Create simple interfaces, scripts, APIs to allow for easy additions of
monitoring to standardized systems.
• Make it part of development & build process.
• Monitor the Big and the Small:
• Big - The finished product. Look for expected result via end user
interface to ensure all services are properly functioning.
• Small - Every layer! OK/FAIL monitoring for multi-dimensional
services and 3rd party providers.
• If setup properly, big and small alerts should trigger simultaneously
allowing instant localization.
11
High level Context
12
3rd Party Monitoring
“In God we trust. Everyone else we monitor.”
13
Remove Roadblocks
14
Build as you go..
15
Automated Troubleshooting
• Trigger action based on logging events (syslog,
splunk, logstash).
• Scan configuration files for dangerous
conditions.
• Pipe events to software designed to diagnose
and take action.
• Actions: disable interfaces, line cards,
services, servers, notify operations.
• Platform that allows easy addition of new
tests based on experience.
16
Tools & Integration
• Use Single Point of truth (ie collins).
• Integrate 3rd party best in breed to SPOT (Nagios,
MRTG, ManageEngine, Vmware, Xen, ScienceLogic)
• Combine communication, monitoring, and culture
• We've connected: Nagios, Phone calls, LiveChats,
Ticket updates, WO updates, Ansible deployments,
Network alerts, Network capacity alerts, Physical
data center alerts, DDoS attacks, Confluence.
• Make documentation & diagram management easier.
"If its not documented, it didnt happen"
17
SPOT ON!
18
High Level Context
Make it part of your culture.
19
Historical Data a Must
• Establish baselines.
• Am I looking at a
preexisting condition?
• What changed and
when?
• Data Aggregation
• 3rd party: DataDog,
ScienceLogic,
GroundWorks,
NewRelic
20
Aggregated Statistics
21
Cluster Wide
CPU
Network
Memory
Disk
Aggregated Statistics
22
Network by Host
Host 1
Host 2
Host 3
Host 4
Aggregated Statistics
23
Network by VM (Host 1)
VM 1
VM 2
VM 3
VM 4
People Factor
• Why is it not taught alongside technical skill?
• Becomes part of the interview process.
• Adopt approach used by the medical field - Bed
Side Clinics. Use your superstars.
• Make it part of your culture and develop a reward
system around it.
• Collapse silos, broaden context for everyone.
24
Thank You!
sagi@webair.com
@webairsagi
25

Weitere ähnliche Inhalte

Was ist angesagt?

Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...
Melina Black
 
Building an AppSec Team Extended Cut
Building an AppSec Team Extended CutBuilding an AppSec Team Extended Cut
Building an AppSec Team Extended Cut
Mike Spaulding
 

Was ist angesagt? (20)

NextGen Endpoint Security for Dummies
NextGen Endpoint Security for DummiesNextGen Endpoint Security for Dummies
NextGen Endpoint Security for Dummies
 
21.06.2017 - KYOS Breakfast Event
21.06.2017 - KYOS Breakfast Event 21.06.2017 - KYOS Breakfast Event
21.06.2017 - KYOS Breakfast Event
 
Its Not You Its Me MSSP Couples Counseling
Its Not You Its Me   MSSP Couples CounselingIts Not You Its Me   MSSP Couples Counseling
Its Not You Its Me MSSP Couples Counseling
 
Six Mistakes of Log Management 2008
Six Mistakes of Log Management 2008Six Mistakes of Log Management 2008
Six Mistakes of Log Management 2008
 
Jack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security MetricsJack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security Metrics
 
Security and Software Engineering BSides St. John's 2017
Security and Software Engineering BSides St. John's 2017Security and Software Engineering BSides St. John's 2017
Security and Software Engineering BSides St. John's 2017
 
451 AppSense Webinar - Why blame the user?
451 AppSense Webinar - Why blame the user?451 AppSense Webinar - Why blame the user?
451 AppSense Webinar - Why blame the user?
 
Harry Regan - It's Never So Bad That It Can't Get Worse
Harry Regan - It's Never So Bad That It Can't Get WorseHarry Regan - It's Never So Bad That It Can't Get Worse
Harry Regan - It's Never So Bad That It Can't Get Worse
 
Digital Product Security
Digital Product SecurityDigital Product Security
Digital Product Security
 
Network Security in a Virtualized Environment
Network Security in a Virtualized EnvironmentNetwork Security in a Virtualized Environment
Network Security in a Virtualized Environment
 
Website homepage presentation
Website homepage presentationWebsite homepage presentation
Website homepage presentation
 
Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...Webinar: Machine learning analytics for immediate resolution to the most chal...
Webinar: Machine learning analytics for immediate resolution to the most chal...
 
Anton's Log Management 'Worst Practices'
Anton's Log Management 'Worst Practices'Anton's Log Management 'Worst Practices'
Anton's Log Management 'Worst Practices'
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
How to Leverage Log Data for Effective Threat Detection
How to Leverage Log Data for Effective Threat DetectionHow to Leverage Log Data for Effective Threat Detection
How to Leverage Log Data for Effective Threat Detection
 
Baselining Logs
Baselining LogsBaselining Logs
Baselining Logs
 
Building an AppSec Team Extended Cut
Building an AppSec Team Extended CutBuilding an AppSec Team Extended Cut
Building an AppSec Team Extended Cut
 
Vulnerability Management V0.1
Vulnerability Management V0.1Vulnerability Management V0.1
Vulnerability Management V0.1
 
QIWI SOC benchmarking: Blue Team story
QIWI SOC benchmarking: Blue Team storyQIWI SOC benchmarking: Blue Team story
QIWI SOC benchmarking: Blue Team story
 
Ten Security Product Categories You've Probably Never Heard Of
Ten Security Product Categories You've Probably Never Heard OfTen Security Product Categories You've Probably Never Heard Of
Ten Security Product Categories You've Probably Never Heard Of
 

Andere mochten auch

Sumit Banik_presentation
Sumit Banik_presentationSumit Banik_presentation
Sumit Banik_presentation
Sumit Banik
 
Company Profile EITS! 2017
Company Profile EITS! 2017Company Profile EITS! 2017
Company Profile EITS! 2017
egha gets
 
Patnent Information services
Patnent Information servicesPatnent Information services
Patnent Information services
Lihua Gao
 
Testailua vaan
Testailua vaanTestailua vaan
Testailua vaan
Tenttu
 

Andere mochten auch (20)

Citas y frases famosas
Citas y frases famosasCitas y frases famosas
Citas y frases famosas
 
Deber de informatica
Deber de informaticaDeber de informatica
Deber de informatica
 
All about relative ctr
All about relative ctrAll about relative ctr
All about relative ctr
 
Gestão de recursos 3º Nível- Curso Básico em Agro-Pecuário
Gestão de recursos 3º Nível- Curso Básico em Agro-PecuárioGestão de recursos 3º Nível- Curso Básico em Agro-Pecuário
Gestão de recursos 3º Nível- Curso Básico em Agro-Pecuário
 
14 daniel lópez álvarez ortega blearning15_v4
14 daniel lópez álvarez  ortega blearning15_v414 daniel lópez álvarez  ortega blearning15_v4
14 daniel lópez álvarez ortega blearning15_v4
 
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
 
Sumit Banik_presentation
Sumit Banik_presentationSumit Banik_presentation
Sumit Banik_presentation
 
Company Profile EITS! 2017
Company Profile EITS! 2017Company Profile EITS! 2017
Company Profile EITS! 2017
 
An intelligent hybrid control for paper machine
An intelligent hybrid control for paper machineAn intelligent hybrid control for paper machine
An intelligent hybrid control for paper machine
 
Numerical solution of boussinesq equation arising in
Numerical solution of boussinesq equation arising inNumerical solution of boussinesq equation arising in
Numerical solution of boussinesq equation arising in
 
An integrated methodology to fixturing technology
An integrated methodology to fixturing technologyAn integrated methodology to fixturing technology
An integrated methodology to fixturing technology
 
الماء1
الماء1الماء1
الماء1
 
Patnent Information services
Patnent Information servicesPatnent Information services
Patnent Information services
 
Testailua vaan
Testailua vaanTestailua vaan
Testailua vaan
 
EJCC Presentation
EJCC PresentationEJCC Presentation
EJCC Presentation
 
Experimental investigation of heat transfer through
Experimental investigation of heat transfer throughExperimental investigation of heat transfer through
Experimental investigation of heat transfer through
 
Inkubator Kultury Pireus - wyniki warsztatów strategicznych - 11.12.15
Inkubator Kultury Pireus - wyniki warsztatów strategicznych - 11.12.15Inkubator Kultury Pireus - wyniki warsztatów strategicznych - 11.12.15
Inkubator Kultury Pireus - wyniki warsztatów strategicznych - 11.12.15
 
8/22 國際研習會簡報_workshop presentation
8/22 國際研習會簡報_workshop presentation8/22 國際研習會簡報_workshop presentation
8/22 國際研習會簡報_workshop presentation
 
Letter Dr. Christian Vaillancourt
Letter Dr. Christian Vaillancourt Letter Dr. Christian Vaillancourt
Letter Dr. Christian Vaillancourt
 
Data research portal
Data research portal Data research portal
Data research portal
 

Ähnlich wie Troubleshooting: A High-Value Asset For The Service-Provider Discipline

Using NetFlow to Streamline Security Analysis and Response to Cyber Threats
Using NetFlow to Streamline Security Analysis and Response to Cyber ThreatsUsing NetFlow to Streamline Security Analysis and Response to Cyber Threats
Using NetFlow to Streamline Security Analysis and Response to Cyber Threats
Emulex Corporation
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
Andrew White
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 

Ähnlich wie Troubleshooting: A High-Value Asset For The Service-Provider Discipline (20)

Preventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log ManagementPreventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log Management
 
Information Security: Advanced SIEM Techniques
Information Security: Advanced SIEM TechniquesInformation Security: Advanced SIEM Techniques
Information Security: Advanced SIEM Techniques
 
Using NetFlow to Streamline Security Analysis and Response to Cyber Threats
Using NetFlow to Streamline Security Analysis and Response to Cyber ThreatsUsing NetFlow to Streamline Security Analysis and Response to Cyber Threats
Using NetFlow to Streamline Security Analysis and Response to Cyber Threats
 
Enterprise Vulnerability Management: Back to Basics
Enterprise Vulnerability Management: Back to BasicsEnterprise Vulnerability Management: Back to Basics
Enterprise Vulnerability Management: Back to Basics
 
Endpoint Modeling 101 - A New Approach to Endpoint Security
Endpoint Modeling 101 - A New Approach to Endpoint SecurityEndpoint Modeling 101 - A New Approach to Endpoint Security
Endpoint Modeling 101 - A New Approach to Endpoint Security
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Securing Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These YearsSecuring Systems - Still Crazy After All These Years
Securing Systems - Still Crazy After All These Years
 
Mds cloud saturday 2015 how to heroku
Mds cloud saturday 2015 how to herokuMds cloud saturday 2015 how to heroku
Mds cloud saturday 2015 how to heroku
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
 
How Celtra Optimizes its Advertising Platform with Databricks
How Celtra Optimizes its Advertising Platformwith DatabricksHow Celtra Optimizes its Advertising Platformwith Databricks
How Celtra Optimizes its Advertising Platform with Databricks
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
1 - Introduction.ppt
1 - Introduction.ppt1 - Introduction.ppt
1 - Introduction.ppt
 
Information Security Risks - What You Can Do To Help Your Clients Avoid Costl...
Information Security Risks - What You Can Do To Help Your Clients Avoid Costl...Information Security Risks - What You Can Do To Help Your Clients Avoid Costl...
Information Security Risks - What You Can Do To Help Your Clients Avoid Costl...
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)
 
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
2015 Mastering SAP Tech - Enterprise Mobility - Testing Lessons Learned
 
intrusion detection system (IDS)
intrusion detection system (IDS)intrusion detection system (IDS)
intrusion detection system (IDS)
 
ISS CAPSTONE TEAM
ISS CAPSTONE TEAMISS CAPSTONE TEAM
ISS CAPSTONE TEAM
 
DevOps Indonesia #14 - Building monitoring framework on container infrastructure
DevOps Indonesia #14 - Building monitoring framework on container infrastructureDevOps Indonesia #14 - Building monitoring framework on container infrastructure
DevOps Indonesia #14 - Building monitoring framework on container infrastructure
 

Mehr von Sagi Brody

Mehr von Sagi Brody (9)

Ransomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-ServiceRansomware-Recovery-as-a-Service
Ransomware-Recovery-as-a-Service
 
Combating Cyberattacks through Network Agility and Automation
Combating Cyberattacks through Network Agility and AutomationCombating Cyberattacks through Network Agility and Automation
Combating Cyberattacks through Network Agility and Automation
 
Ransomware: The Defendable Epidemic
Ransomware: The Defendable EpidemicRansomware: The Defendable Epidemic
Ransomware: The Defendable Epidemic
 
Automated Ransomware Recovery for Full Cyber Protection
Automated Ransomware Recovery for Full Cyber ProtectionAutomated Ransomware Recovery for Full Cyber Protection
Automated Ransomware Recovery for Full Cyber Protection
 
Pulling Back the Cloud Curtain
Pulling Back the Cloud CurtainPulling Back the Cloud Curtain
Pulling Back the Cloud Curtain
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container Technology
 
Multi-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation StrategiesMulti-Layer DDoS Mitigation Strategies
Multi-Layer DDoS Mitigation Strategies
 
Managing Remote Operation Teams
Managing Remote Operation TeamsManaging Remote Operation Teams
Managing Remote Operation Teams
 
TroubleShooting as a Service
TroubleShooting as a ServiceTroubleShooting as a Service
TroubleShooting as a Service
 

Kürzlich hochgeladen

在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
nilamkumrai
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 

Kürzlich hochgeladen (20)

Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
( Pune ) VIP Pimpri Chinchwad Call Girls 🎗️ 9352988975 Sizzling | Escorts | G...
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Pollachi 7001035870 Whatsapp Number, 24/07 Booking
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 

Troubleshooting: A High-Value Asset For The Service-Provider Discipline

  • 1. Sagi Brody, CTO @webairsagi Troubleshooting: A High Value Asset for The Service Provider Discipline
  • 2. Background • 17+ Years experience as a service provider. • MSP - We own your infrastructure stack. • Managed Public, Private, and Hybrid Cloud. • High Touch - Monitoring, Managing, Securing all layers (app/db/fw/nw/cdn) • Mix of Open Source, Commercial, and Proprietary Software. • Context of an MSP but applicable to all IT.
  • 3. The Art of Troubleshooting • Troubleshooting is a standalone skill - Must be Taught, Trained, Documented, Reviewed. • Technical skills are important but soft skills cannot be forgotten. • Big buzz around hyped technologies in DevOps. • Other important soft skills - Resourcefulness, Communication, Technical Documentation, Mentoring.
  • 4. Why Care? • We're already doing it, internally or customer facing. • We're trusted to do it well. • We’re judged how we act during crisis. • Technology agnostic skill. Never become obsolete and only becomes more important as systems become more complex. • Applicable to infrastructure & software development. • Allows us to scale - It's how we manage many complex environments with small teams. • Reduce downtime & Save lives!! 4
  • 5. Brief History • Hardware & Software Systems traditionally monolithic in nature. • Flat topology without abstraction. • Mostly physical infrastructure. • Perpetual configurations . • No automation. • Easier Troubleshooting. 5
  • 6. Today • Distributed, virtualized, and abstracted infrastructure - Network, Storage, Compute. • Self-healing & Autoscaling • Microservices based architecture • Increased complexity with many benefits (scaling, CI,time). • Decreased visibility, control, and ease of Troubleshooting. • Troubleshooting == An Art. 6
  • 7. Where to Start? • Fast resolution starts with absolute understanding of the issue. • Is this the cause or a symptom? • Was it brought to us pre-diagnosed? • What is the context? • Why is this a problem? Why is it important? • No tunnel vision. Look at the big picture. • Examples: 'Email Issue' , 'Wifi/Speed' , 'DDoS & API' 7
  • 8. Observe • Can you have a clear understanding of the issue without seeing it for yourself? • Before attempting resolution, be sure you can reproduce. • Gain perspective using tools - remote logins, screen shares, screen shots. • Understand expected behavior vs fault. • Software development & Debugging tools. • Supplement lack of perspective with solid communication. 8
  • 9. Localize • Drill Down - Peel back the layers of the onion. • Process of elimination to localize the issue. • Examples: static content vs dynamic content, local storage vs shared/network storage, directly test and bypass layers, rule out network • Follow the flow of data and test each step. • Major Outages - localize to commonality - What do the effected services have in common? • Use monitoring tools or SPOTs to localize. • Don't assume anything - NIH (Not invented here). 9
  • 10. Resolve and Test • Scientific Method: Question -> Hypothesis –> Experiment -> Observation ->Analysis -> Conclusion • Make small incremental changes. • Use sandboxes where possible. • Document each test and result, and change. • Use ability to reproduce and observe to confirm resolution. 10
  • 11. Monitor for Success • Fast resolution starts with a solid monitoring strategy. • Empower DevOps to create layered alerts. • Create simple interfaces, scripts, APIs to allow for easy additions of monitoring to standardized systems. • Make it part of development & build process. • Monitor the Big and the Small: • Big - The finished product. Look for expected result via end user interface to ensure all services are properly functioning. • Small - Every layer! OK/FAIL monitoring for multi-dimensional services and 3rd party providers. • If setup properly, big and small alerts should trigger simultaneously allowing instant localization. 11
  • 13. 3rd Party Monitoring “In God we trust. Everyone else we monitor.” 13
  • 15. Build as you go.. 15
  • 16. Automated Troubleshooting • Trigger action based on logging events (syslog, splunk, logstash). • Scan configuration files for dangerous conditions. • Pipe events to software designed to diagnose and take action. • Actions: disable interfaces, line cards, services, servers, notify operations. • Platform that allows easy addition of new tests based on experience. 16
  • 17. Tools & Integration • Use Single Point of truth (ie collins). • Integrate 3rd party best in breed to SPOT (Nagios, MRTG, ManageEngine, Vmware, Xen, ScienceLogic) • Combine communication, monitoring, and culture • We've connected: Nagios, Phone calls, LiveChats, Ticket updates, WO updates, Ansible deployments, Network alerts, Network capacity alerts, Physical data center alerts, DDoS attacks, Confluence. • Make documentation & diagram management easier. "If its not documented, it didnt happen" 17
  • 19. High Level Context Make it part of your culture. 19
  • 20. Historical Data a Must • Establish baselines. • Am I looking at a preexisting condition? • What changed and when? • Data Aggregation • 3rd party: DataDog, ScienceLogic, GroundWorks, NewRelic 20
  • 22. Aggregated Statistics 22 Network by Host Host 1 Host 2 Host 3 Host 4
  • 23. Aggregated Statistics 23 Network by VM (Host 1) VM 1 VM 2 VM 3 VM 4
  • 24. People Factor • Why is it not taught alongside technical skill? • Becomes part of the interview process. • Adopt approach used by the medical field - Bed Side Clinics. Use your superstars. • Make it part of your culture and develop a reward system around it. • Collapse silos, broaden context for everyone. 24