SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Theo Schlossnagle, Founder @Circonus, @postwait on Twitter
Operational Software
Ignorance
Buys
Pain
https://www.flickr.com/photos/hoyvinmayvin/4906678960
Tenet #1
Designing for the unknown
should never burden today
Compromise when forced.
Design with tomorrow’s expected volume,
but today’s functional requirements.
https://www.flickr.com/photos/alexander/20090860
Tenet #2
Never be left curious as to
what your software is doing
Observability is the only fast-path to success.
https://www.flickr.com/photos/jox1989/4764186425
Tenet #3
Understand what your
software was doing
This is one of the trickiest tenets.
You can’t log every instruction performed, every packet
sent & received, every context switch, every I/O.
Log things that are infrequent,
measure things that are frequent.
What’s frequent? I said this was tricky.
https://www.flickr.com/photos/guest_family/6175062186
Tenet #4
Understand internal failures
The only thing worse than a malfunction in production,
is one without sufficient actionable data.
Core dump, stack trace analysis, etc.
These can wake people up.
https://www.flickr.com/photos/skipthefiller/2451481428
Tenet #5
Operator remediation of an
external failure is a bug
A system failing, a disk failing, a network partition, etc.
Once the external failure is remediated, no human being
should be involved in returning the (internal) system to
correct operation.
https://www.flickr.com/photos/timzim/2308099322
Tenet #6 & #7
Tight coupling

reduces resiliency
Loose coupling

reduces debug-ability
Tenet #8
Avoid difficult problems

if possible
One does not “just add” replication, consensus and fault
tolerance into systems.
Never solve a harder problem than you are presented.
The best solution to a problem is to remove the problem.
https://www.flickr.com/photos/shakestercody/2124972276
A tail of two systems
Queueing
Storage
https://www.flickr.com/photos/skynoir/8300610952
From RabbitMQ to Fq
RabbitMQ: Oh, the outages we’ve had.
violates #1, #2, #3, #5, #8
Fq:
(#1) build only semantics we need
(#2/#3) DTrace probes
(#4) cores dumps and backtrace.io
(#5) handled by simplicity of design and use
(#6/#7) push decoupling out, gain debug-ability
(#8) punt clustering downstream to clients
https://www.flickr.com/photos/fotologic/2165675515
ns level deep routing
From Postgres to Snowth
Postgres 8… hacked in a column store using arrays
Tenet #3: build pg_statsd
Tenet #5: … build Snowth
Snowth: ring topology time-series store
(#1/#8) only commutative operations,
(#2) DTrace probes
(#3) Circonus itself (statsd, histograms, etc.)
(#4) backtrace.io
(#7) implemented Zipkin (Dapper-like) tracing
quartile bands
“mvalue” best-guess modes

Weitere ähnliche Inhalte

Was ist angesagt?

Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
 
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesAI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesDynatrace
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guidematthewbrahms
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to ProductionKarthik Gaekwad
 
Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Alex Cachia
 
Casino In The Clouds
Casino In The CloudsCasino In The Clouds
Casino In The Cloudsgojkoadzic
 
Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...Dynatrace
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos EngineeringGremlin
 
Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x Stefan Zier
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About PerformanceTheo Schlossnagle
 
The Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSgThe Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSgDevSecOpsSg
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WaySeniorStoryteller
 
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014Institut Lean France
 
Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015James Wickett
 
AppSec is Eating Security
AppSec is Eating SecurityAppSec is Eating Security
AppSec is Eating SecurityAlex Stamos
 
4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to know4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to knowDynatrace
 
Security as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachSecurity as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachVMware Tanzu
 
Tales from a radically polyglot team
Tales from a radically polyglot teamTales from a radically polyglot team
Tales from a radically polyglot teamThoughtworks
 
Chaos Driven Development
Chaos Driven DevelopmentChaos Driven Development
Chaos Driven DevelopmentBruce Wong
 

Was ist angesagt? (20)

Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
 
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesAI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
 
Chaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field GuideChaos Engineering 101: A Field Guide
Chaos Engineering 101: A Field Guide
 
Chaos engineering intro
Chaos engineering introChaos engineering intro
Chaos engineering intro
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to Production
 
Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...
 
Casino In The Clouds
Casino In The CloudsCasino In The Clouds
Casino In The Clouds
 
Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...Gaining visibility into your Openshift application container platform with Dy...
Gaining visibility into your Openshift application container platform with Dy...
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
 
Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x Scaling a Start-up DevOps team to 10x while scaling the system 50x
Scaling a Start-up DevOps team to 10x while scaling the system 50x
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
The Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSgThe Rise of DevSecOps - Fabian Lim - DevSecOpsSg
The Rise of DevSecOps - Fabian Lim - DevSecOpsSg
 
Ops Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the WayOps Happen: Improve Security Without Getting in the Way
Ops Happen: Improve Security Without Getting in the Way
 
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
Enabling Lean IT with AWS by Carlos Condé at the Lean IT Summit 2014
 
Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015
 
AppSec is Eating Security
AppSec is Eating SecurityAppSec is Eating Security
AppSec is Eating Security
 
4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to know4 Node.js Gotchas: What your ops team needs to know
4 Node.js Gotchas: What your ops team needs to know
 
Security as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachSecurity as Code: A DevSecOps Approach
Security as Code: A DevSecOps Approach
 
Tales from a radically polyglot team
Tales from a radically polyglot teamTales from a radically polyglot team
Tales from a radically polyglot team
 
Chaos Driven Development
Chaos Driven DevelopmentChaos Driven Development
Chaos Driven Development
 

Andere mochten auch

Pack iTS case study analysis
Pack iTS case study analysisPack iTS case study analysis
Pack iTS case study analysisSumit Singh
 
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...Avere Systems
 
Optical network architecture
Optical network architectureOptical network architecture
Optical network architectureSiddharth Singh
 
Starbucks organisational culture
Starbucks organisational cultureStarbucks organisational culture
Starbucks organisational cultureJoel Sebastian
 
Google Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative GuideGoogle Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative GuideJimmy Jay
 
Common terminologies of obstetrics
Common terminologies of obstetricsCommon terminologies of obstetrics
Common terminologies of obstetricsZeeshan Khan
 
The Payments Value Chain
The Payments Value ChainThe Payments Value Chain
The Payments Value ChainPYMNTS.com
 
Porter's five force analysis on computer industry
Porter's five force analysis on computer industryPorter's five force analysis on computer industry
Porter's five force analysis on computer industryRajath Menon
 
Overview Of Oil & Gas Accounting
Overview Of Oil & Gas AccountingOverview Of Oil & Gas Accounting
Overview Of Oil & Gas Accountinghori
 
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...eme2525
 
Mobile Network Capacity Issues
Mobile Network Capacity IssuesMobile Network Capacity Issues
Mobile Network Capacity IssuesPhilip Corsano
 
Elements of a fairytale
Elements of a fairytaleElements of a fairytale
Elements of a fairytaleamandakuhl
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPTUpender Upr
 
Project Management Office Roles Functions And Benefits
Project Management Office Roles Functions And BenefitsProject Management Office Roles Functions And Benefits
Project Management Office Roles Functions And BenefitsMaria Erland, PMP
 
Allergy and Hypersensitivity
Allergy and HypersensitivityAllergy and Hypersensitivity
Allergy and HypersensitivityMedicineAndHealth
 
Organization development and change
Organization development and changeOrganization development and change
Organization development and changesomanishalaka
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computingHaslina
 
Compensation management
Compensation managementCompensation management
Compensation management805984
 

Andere mochten auch (20)

Pack iTS case study analysis
Pack iTS case study analysisPack iTS case study analysis
Pack iTS case study analysis
 
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
Optimizing the Upstreaming Workflow: Flexibly Scale Storage for Seismic Proce...
 
Optical network architecture
Optical network architectureOptical network architecture
Optical network architecture
 
Os security issues
Os security issuesOs security issues
Os security issues
 
Starbucks organisational culture
Starbucks organisational cultureStarbucks organisational culture
Starbucks organisational culture
 
Google Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative GuideGoogle Analytics vs. Omniture Comparative Guide
Google Analytics vs. Omniture Comparative Guide
 
Common terminologies of obstetrics
Common terminologies of obstetricsCommon terminologies of obstetrics
Common terminologies of obstetrics
 
The Payments Value Chain
The Payments Value ChainThe Payments Value Chain
The Payments Value Chain
 
Porter's five force analysis on computer industry
Porter's five force analysis on computer industryPorter's five force analysis on computer industry
Porter's five force analysis on computer industry
 
ADVANTAGES OF CAVITY WALLS
ADVANTAGES OF CAVITY WALLSADVANTAGES OF CAVITY WALLS
ADVANTAGES OF CAVITY WALLS
 
Overview Of Oil & Gas Accounting
Overview Of Oil & Gas AccountingOverview Of Oil & Gas Accounting
Overview Of Oil & Gas Accounting
 
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
Pop up! a manual of paper mechanisms - duncan birmingham (tarquin books) [pop...
 
Mobile Network Capacity Issues
Mobile Network Capacity IssuesMobile Network Capacity Issues
Mobile Network Capacity Issues
 
Elements of a fairytale
Elements of a fairytaleElements of a fairytale
Elements of a fairytale
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
 
Project Management Office Roles Functions And Benefits
Project Management Office Roles Functions And BenefitsProject Management Office Roles Functions And Benefits
Project Management Office Roles Functions And Benefits
 
Allergy and Hypersensitivity
Allergy and HypersensitivityAllergy and Hypersensitivity
Allergy and Hypersensitivity
 
Organization development and change
Organization development and changeOrganization development and change
Organization development and change
 
Slides cloud computing
Slides cloud computingSlides cloud computing
Slides cloud computing
 
Compensation management
Compensation managementCompensation management
Compensation management
 

Ähnlich wie Operational Software Design

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
 
Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012slandelle
 
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdfdino715195
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrJohn Allspaw
 
Living system or build factory - Chris Maxwell
Living system or build factory  - Chris MaxwellLiving system or build factory  - Chris Maxwell
Living system or build factory - Chris MaxwellDevopsdays
 
Gatling - JUGL, 2012-09-13
Gatling  - JUGL, 2012-09-13Gatling  - JUGL, 2012-09-13
Gatling - JUGL, 2012-09-13Nicolas Rémond
 
Ways to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous deliveryWays to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous deliverya32an
 
Easy recovery621 user guide en
Easy recovery621 user guide enEasy recovery621 user guide en
Easy recovery621 user guide enjmav1502
 
Comment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jourComment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jourCARA_Lyon
 
Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010David Gageot
 
Growing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaisesGrowing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaisesPhilip Bauer
 
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check itChristian Earl Magpayo
 
NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015Archersan
 
BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.Jakub Kałużny
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharingJames Hsieh
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...Sveta Smirnova
 

Ähnlich wie Operational Software Design (20)

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
 
Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012Hands On, Duchess 10/17/2012
Hands On, Duchess 10/17/2012
 
Netgear router error codes
Netgear router error codesNetgear router error codes
Netgear router error codes
 
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
 
Dev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and FlickrDev and Ops Collaboration and Awareness at Etsy and Flickr
Dev and Ops Collaboration and Awareness at Etsy and Flickr
 
Living system or build factory - Chris Maxwell
Living system or build factory  - Chris MaxwellLiving system or build factory  - Chris Maxwell
Living system or build factory - Chris Maxwell
 
Gatling - JUGL, 2012-09-13
Gatling  - JUGL, 2012-09-13Gatling  - JUGL, 2012-09-13
Gatling - JUGL, 2012-09-13
 
Ways to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous deliveryWays to minimise performance risks in continuous delivery
Ways to minimise performance risks in continuous delivery
 
Easy recovery621 user guide en
Easy recovery621 user guide enEasy recovery621 user guide en
Easy recovery621 user guide en
 
Comment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jourComment j'ai mis ma suite de tests au régime en 5 minutes par jour
Comment j'ai mis ma suite de tests au régime en 5 minutes par jour
 
Silos are for farmers
Silos are for farmersSilos are for farmers
Silos are for farmers
 
Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010Let's make this test suite run faster! SoftShake 2010
Let's make this test suite run faster! SoftShake 2010
 
Growing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaisesGrowing pains - PosKeyErrors and other malaises
Growing pains - PosKeyErrors and other malaises
 
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
[Latest] Samsung Galaxy S3 Clone (SP8810 S930) How to root merun na! check it
 
NCET Tech
NCET Tech NCET Tech
NCET Tech
 
NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015NCET Tech Bite - Cloud Storage and Data Backup - June 2015
NCET Tech Bite - Cloud Storage and Data Backup - June 2015
 
BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.BSides London 2015 - Proprietary network protocols - risky business on the wire.
BSides London 2015 - Proprietary network protocols - risky business on the wire.
 
DAC
DACDAC
DAC
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharing
 
What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...What you wanted to know about MySQL, but could not find using inernal instrum...
What you wanted to know about MySQL, but could not find using inernal instrum...
 

Mehr von Theo Schlossnagle

Mehr von Theo Schlossnagle (20)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Project reality
Project realityProject reality
Project reality
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 
Atldevops
AtldevopsAtldevops
Atldevops
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
What's in a number?
What's in a number?What's in a number?
What's in a number?
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Operational Software Design

  • 1. Theo Schlossnagle, Founder @Circonus, @postwait on Twitter Operational Software Ignorance Buys Pain https://www.flickr.com/photos/hoyvinmayvin/4906678960
  • 2. Tenet #1 Designing for the unknown should never burden today Compromise when forced. Design with tomorrow’s expected volume, but today’s functional requirements. https://www.flickr.com/photos/alexander/20090860
  • 3. Tenet #2 Never be left curious as to what your software is doing Observability is the only fast-path to success. https://www.flickr.com/photos/jox1989/4764186425
  • 4. Tenet #3 Understand what your software was doing This is one of the trickiest tenets. You can’t log every instruction performed, every packet sent & received, every context switch, every I/O. Log things that are infrequent, measure things that are frequent. What’s frequent? I said this was tricky. https://www.flickr.com/photos/guest_family/6175062186
  • 5. Tenet #4 Understand internal failures The only thing worse than a malfunction in production, is one without sufficient actionable data. Core dump, stack trace analysis, etc. These can wake people up. https://www.flickr.com/photos/skipthefiller/2451481428
  • 6. Tenet #5 Operator remediation of an external failure is a bug A system failing, a disk failing, a network partition, etc. Once the external failure is remediated, no human being should be involved in returning the (internal) system to correct operation. https://www.flickr.com/photos/timzim/2308099322
  • 7. Tenet #6 & #7 Tight coupling
 reduces resiliency Loose coupling
 reduces debug-ability
  • 8. Tenet #8 Avoid difficult problems
 if possible One does not “just add” replication, consensus and fault tolerance into systems. Never solve a harder problem than you are presented. The best solution to a problem is to remove the problem. https://www.flickr.com/photos/shakestercody/2124972276
  • 9. A tail of two systems Queueing Storage https://www.flickr.com/photos/skynoir/8300610952
  • 10. From RabbitMQ to Fq RabbitMQ: Oh, the outages we’ve had. violates #1, #2, #3, #5, #8 Fq: (#1) build only semantics we need (#2/#3) DTrace probes (#4) cores dumps and backtrace.io (#5) handled by simplicity of design and use (#6/#7) push decoupling out, gain debug-ability (#8) punt clustering downstream to clients https://www.flickr.com/photos/fotologic/2165675515
  • 11. ns level deep routing
  • 12. From Postgres to Snowth Postgres 8… hacked in a column store using arrays Tenet #3: build pg_statsd Tenet #5: … build Snowth Snowth: ring topology time-series store (#1/#8) only commutative operations, (#2) DTrace probes (#3) Circonus itself (statsd, histograms, etc.) (#4) backtrace.io (#7) implemented Zipkin (Dapper-like) tracing
  • 13.
  • 14.