SlideShare a Scribd company logo
1 of 29
Using Machine Learning to Optimize
DevOps Practices
Building Learning into Monitoring and Feedback
Peter Varhol
About me
• International speaker and writer
• Degrees in Math, CS, Psychology
• Technology communicator
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com
Agenda
• What is machine learning?
• How is machine learning applied to DevOps?
• Challenges in training these systems
• What constitutes an issue?
• Summary and conclusions
What is Machine Learning?
• Layered algorithms that change parameters based on feedback
from know data
• Can be linear or nonlinear
• Algorithms can be fixed in production or adaptive
• Fixed – algorithms do not adjust once deployed
• Adaptive – algorithms continually adjust to new data
• Usually part of a larger system
Adaptive Systems
• Airline pricing
• Ticket prices change three times a day based on demand
• It can cost less to go farther
• It can cost less later
• Ecommerce systems
• Recommendations try to discern what else you might want
• Can I incentivize you to fill up the plane?
Why Use Adaptive?
• The “right” result will vary over time
• Trying to optimize a particular result
• Revenue
• The problem domain is not static
Confidential, Dynatrace LLC
How Are Fixed Systems Used?
• Transportation
• Self-driving cars
• Aircraft/Drones
• Ecommerce
• Recommendation engines
• Medical
• Diagnosis systems
Why Use Fixed Machine Learning Systems
• The problem domain is static
• The expectations remain constant
• The right answer is known under most conditions
• The original algorithms remain valid over a long period of time
DevOps Practices Generate Data
• During development
• Agile metrics, JIRA issues, test case metrics
• During continuous integration
• System test metrics
• During continuous deployment
• Quality metrics for deployments
• After deployment and into production
• Application availability and performance
• Usage log files
Focus on Monitoring
• Ongoing data on availability and performance
• RUM
• Synthetic tests
• Application monitoring
• Monitoring tackles the back end of DevOps
• Identifying unhealthy trends
• Diagnoses failures and poor performance
• Recommends action
• Fixed or adaptive depends on your goals
Where Do Predictive Analytics Come In?
• Big data makes possible predictions of future events
• Are we going to fail?
• How will we perform with traffic surges?
• As well as past events
• What went wrong and how do we fix it
• We can rely on past data
• Adaptive systems may not perform as well
• Clear goals needed
What Technologies Are Involved?
• Neural networks
• Genetic algorithms
• Rules engines
Neural Networks
• Set of layered algorithms whose variables can be
adjusted via a learning process
• The learning process involves training with
known inputs and outputs
• The algorithms adjust coefficients to converge on
the correct answer (or not)
• You freeze the algorithms and coefficients, and
deploy
• Or you optimize on a particular set of characteristics
A Sample Neural Network
Genetic Algorithms
• Use the principle of natural selection
• Create a range of possible solutions
• Try out each of them
• Choose and combine two of the better
alternatives
• Rinse and repeat as necessary
Bringing in DevOps
• DevOps has data that can be used to train neural networks
• Health of the application
• Trends in application traffic and responsiveness
• Application failure
Machine Learning Helps DevOps
• Decisions are complex
• Why is the CPU maxed?
• What is causing disk thrashing?
• Why did the network slow?
• Why did the application fail?
• Data is massive
• Potentially thousands of data points a day
How Good Are Decisions?
• Expert versus machine
• Given the same data
• In many domains they tie
• With additional data, the human can be better
• But machine learning will get better
• But only as good as the data
We Want to Do Two Things
• Identify trends that may indicate future problems
• Increasing response times
• More page errors
• Diagnose faults once they have happened
• Why did the application fail?
• How can we fix it as quickly as possible?
Fixed Algorithms Work for Some Problems
• Immediate performance and failure identification
• Diagnosis of failures and performance issues
• These are readily identifiable from known data
Adaptive Systems Supplement These Tools
• Predictions of future events
• Performance
• Availability
• The target is moving
• So we need current data to adjust the algorithms
The Machine Helps the DevOps Expert
• The machine learning app provides:
• Early warning on possible performance issues and failures
• Immediate notification of failure or impending failure
• Trend analysis of data to predict unhealthy outcomes
• The machine learning is an assistant
• It can’t fix anything
• It can’t necessarily identify the root cause
What is the Goal?
• We have many ways of monitoring
• Many of them are represented at this conference
• Each measures something a little different
• Latency, response time, availability, network, DNS . . .
• Too much data can be no better than no data at all
• Machine learning can correlate across
measurements
• Focus to eliminate false positives
Intelligent Systems Are Sometimes Wrong
• The problem domain is ambiguous
• There is no single “right” answer
• “Close enough” is good
• We don’t know quite why the software
responds as it does
• We can’t easily trace code paths
Testing Machine Learning Systems
• Have objective acceptance criteria
• Test with new data
• Don’t count on all results being accurate
• Understand the architecture of the network as a part of
the testing process
• Communicate the level of confidence you have in the
results to management and users
A Cautionary Tale
• All events are not created equal
• AI systems treat events equally
• A failure of a system during busy season is the same as any other
• DevOps pros know otherwise
• And can exert additional effort in response
• And actually fix the problem
• We can’t automate what we don’t understand
• You need the human in the loop
Confidential, Dynatrace LLC
Conclusions
• DevOps is a natural environment for machine learning
systems
• Any activity that generates data and requires a decision is fair game
• Monitoring is low-hanging fruit
• Fixed systems for failure and diagnosis, adaptive for trend
analysis
Confidential, Dynatrace LLC
References
• https://qz.com/989137/when-a-robot-ai-doctor-misdiagnoses-you-
whos-to-blame/
• https://pvarhol.wordpress.com/2017/07/22/what-brought-about-
our-ai-revolution/
• https://pvarhol.wordpress.com/2017/06/21/analytics-dont-apply-in-
the-clutch/
Confidential, Dynatrace LLC
Thank You
Peter Varhol
peter@petervarhol.com

More Related Content

What's hot

TransPort Workshop
TransPort WorkshopTransPort Workshop
TransPort Workshopjwcampbe
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed SystemsAleksandr Tavgen
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins outDruantia
 
What Do We Automate First
What Do We Automate FirstWhat Do We Automate First
What Do We Automate Firstrrice2000
 
Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Solano Labs
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates YouBradford Stephens
 
SharePoint Troubleshooting
SharePoint TroubleshootingSharePoint Troubleshooting
SharePoint TroubleshootingToby McGrail
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Bradford Stephens
 
New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016Stevan Zivanovic
 
Wix Automation - Automation Manager
Wix Automation - Automation ManagerWix Automation - Automation Manager
Wix Automation - Automation ManagerEfrat Attas
 
Performing Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXPerforming Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXDatavail
 
Solano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testingSolano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testingMassTLC
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins outDruantia
 
Becoma an Ace in Analytics
Becoma an Ace in AnalyticsBecoma an Ace in Analytics
Becoma an Ace in AnalyticsKen Goossens
 
Digital Testing Approach
Digital Testing ApproachDigital Testing Approach
Digital Testing ApproachAnand Deshpande
 

What's hot (20)

Optimizing Java
Optimizing JavaOptimizing Java
Optimizing Java
 
TransPort Workshop
TransPort WorkshopTransPort Workshop
TransPort Workshop
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
 
What Do We Automate First
What Do We Automate FirstWhat Do We Automate First
What Do We Automate First
 
Automated testing san francisco oct 2013
Automated testing san francisco oct 2013Automated testing san francisco oct 2013
Automated testing san francisco oct 2013
 
Your Data Scientist Hates You
Your Data Scientist Hates YouYour Data Scientist Hates You
Your Data Scientist Hates You
 
SharePoint Troubleshooting
SharePoint TroubleshootingSharePoint Troubleshooting
SharePoint Troubleshooting
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016New technology new approaches - tmf - july 2016
New technology new approaches - tmf - july 2016
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Wix Automation - Automation Manager
Wix Automation - Automation ManagerWix Automation - Automation Manager
Wix Automation - Automation Manager
 
Performing Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEXPerforming Oracle Health Checks Using APEX
Performing Oracle Health Checks Using APEX
 
4 pc repair
4 pc repair4 pc repair
4 pc repair
 
Solano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testingSolano Labs presented at MassTLC's automated testing
Solano Labs presented at MassTLC's automated testing
 
Grab a coffee and take 5 mins out
Grab a coffee and take 5 mins outGrab a coffee and take 5 mins out
Grab a coffee and take 5 mins out
 
Becoma an Ace in Analytics
Becoma an Ace in AnalyticsBecoma an Ace in Analytics
Becoma an Ace in Analytics
 
SHEKHAR VERMA
SHEKHAR VERMASHEKHAR VERMA
SHEKHAR VERMA
 
Digital Testing Approach
Digital Testing ApproachDigital Testing Approach
Digital Testing Approach
 

Viewers also liked

Using Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps MaturityUsing Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps MaturityJosh Atwell
 
The API Side of Monitoring
The API Side of MonitoringThe API Side of Monitoring
The API Side of MonitoringNordic APIs
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsJosh Atwell
 
Managing the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShellManaging the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShellJosh Atwell
 
DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?Qualitest
 
Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?Kevin Cross
 
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯Josh Atwell
 
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOpsJosh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOpsKevin Cross
 
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingDevopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingJon Stevens-Hall
 

Viewers also liked (9)

Using Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps MaturityUsing Infrastructure as an Accelerator of DevOps Maturity
Using Infrastructure as an Accelerator of DevOps Maturity
 
The API Side of Monitoring
The API Side of MonitoringThe API Side of Monitoring
The API Side of Monitoring
 
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and UnicornsVMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
VMUG Melbourne - DevOps - Not Just for Open Source and Unicorns
 
Managing the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShellManaging the Infrastructure Stack with PowerShell
Managing the Infrastructure Stack with PowerShell
 
DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?DevOps and Groupthink An Oxymoron?
DevOps and Groupthink An Oxymoron?
 
Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?Philipp Krenn - NoSQL Means No Security?
Philipp Krenn - NoSQL Means No Security?
 
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
Work + Family +Self + Fast Paced Industry = ¯\_(ツ)_/¯
 
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOpsJosh Atwell - Infrastructure Extensibility at Home and in DevOps
Josh Atwell - Infrastructure Extensibility at Home and in DevOps
 
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - SwarmingDevopsdays Edinburgh 2017 - Ignite talk - Swarming
Devopsdays Edinburgh 2017 - Ignite talk - Swarming
 

Similar to Using Machine Learning to Optimize DevOps Practices

Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatracePeter Varhol
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systemsPeter Varhol
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoringAndrew White
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesPeter Varhol
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesPeter Varhol
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyTimetrix
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine LearningRandy Shoup
 
Making a Mock by Kelsey Shannahan
Making a Mock by Kelsey ShannahanMaking a Mock by Kelsey Shannahan
Making a Mock by Kelsey ShannahanQA or the Highway
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routinePeter Varhol
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps worldLucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps worldDevOps Enterprise Summit
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsInductive Automation
 
Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​Montrium
 
The Analysis Part of Integration Projects
The Analysis Part of Integration ProjectsThe Analysis Part of Integration Projects
The Analysis Part of Integration ProjectsBizTalk360
 
Avoiding test hell
Avoiding test hellAvoiding test hell
Avoiding test hellYun Ki Lee
 
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster RecoveryAlphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster RecoveryInternetwork Engineering (IE)
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsInductive Automation
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 

Similar to Using Machine Learning to Optimize DevOps Practices (20)

Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
 
How to improve your system monitoring
How to improve your system monitoringHow to improve your system monitoring
How to improve your system monitoring
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Observability – the good, the bad, and the ugly
Observability – the good, the bad, and the uglyObservability – the good, the bad, and the ugly
Observability – the good, the bad, and the ugly
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Making a Mock by Kelsey Shannahan
Making a Mock by Kelsey ShannahanMaking a Mock by Kelsey Shannahan
Making a Mock by Kelsey Shannahan
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps worldLucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
Lucas Gravley - HP - Self-Healing And Monitoring in a DevOps world
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​Implementing Metrics & Completeness Reporting in TMF Management​
Implementing Metrics & Completeness Reporting in TMF Management​
 
The Analysis Part of Integration Projects
The Analysis Part of Integration ProjectsThe Analysis Part of Integration Projects
The Analysis Part of Integration Projects
 
Avoiding test hell
Avoiding test hellAvoiding test hell
Avoiding test hell
 
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster RecoveryAlphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
Alphabet Soup: A(utomation), BC (Business Continuity) and DR (Disaster Recovery
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 

More from Peter Varhol

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor SyndromePeter Varhol
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the futurePeter Varhol
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisPeter Varhol
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsPeter Varhol
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debtPeter Varhol
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignitePeter Varhol
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightningPeter Varhol
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Peter Varhol
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varholPeter Varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolPeter Varhol
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testingPeter Varhol
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Peter Varhol
 

More from Peter Varhol (12)

DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor Syndrome
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the future
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolis
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing Teams
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debt
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignite
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightning
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps tool
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testing
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Using Machine Learning to Optimize DevOps Practices

  • 1. Using Machine Learning to Optimize DevOps Practices Building Learning into Monitoring and Feedback Peter Varhol
  • 2. About me • International speaker and writer • Degrees in Math, CS, Psychology • Technology communicator • Former university professor, tech journalist • Cat owner and distance runner • peter@petervarhol.com
  • 3. Agenda • What is machine learning? • How is machine learning applied to DevOps? • Challenges in training these systems • What constitutes an issue? • Summary and conclusions
  • 4. What is Machine Learning? • Layered algorithms that change parameters based on feedback from know data • Can be linear or nonlinear • Algorithms can be fixed in production or adaptive • Fixed – algorithms do not adjust once deployed • Adaptive – algorithms continually adjust to new data • Usually part of a larger system
  • 5. Adaptive Systems • Airline pricing • Ticket prices change three times a day based on demand • It can cost less to go farther • It can cost less later • Ecommerce systems • Recommendations try to discern what else you might want • Can I incentivize you to fill up the plane?
  • 6. Why Use Adaptive? • The “right” result will vary over time • Trying to optimize a particular result • Revenue • The problem domain is not static Confidential, Dynatrace LLC
  • 7. How Are Fixed Systems Used? • Transportation • Self-driving cars • Aircraft/Drones • Ecommerce • Recommendation engines • Medical • Diagnosis systems
  • 8. Why Use Fixed Machine Learning Systems • The problem domain is static • The expectations remain constant • The right answer is known under most conditions • The original algorithms remain valid over a long period of time
  • 9. DevOps Practices Generate Data • During development • Agile metrics, JIRA issues, test case metrics • During continuous integration • System test metrics • During continuous deployment • Quality metrics for deployments • After deployment and into production • Application availability and performance • Usage log files
  • 10. Focus on Monitoring • Ongoing data on availability and performance • RUM • Synthetic tests • Application monitoring • Monitoring tackles the back end of DevOps • Identifying unhealthy trends • Diagnoses failures and poor performance • Recommends action • Fixed or adaptive depends on your goals
  • 11. Where Do Predictive Analytics Come In? • Big data makes possible predictions of future events • Are we going to fail? • How will we perform with traffic surges? • As well as past events • What went wrong and how do we fix it • We can rely on past data • Adaptive systems may not perform as well • Clear goals needed
  • 12. What Technologies Are Involved? • Neural networks • Genetic algorithms • Rules engines
  • 13. Neural Networks • Set of layered algorithms whose variables can be adjusted via a learning process • The learning process involves training with known inputs and outputs • The algorithms adjust coefficients to converge on the correct answer (or not) • You freeze the algorithms and coefficients, and deploy • Or you optimize on a particular set of characteristics
  • 14. A Sample Neural Network
  • 15. Genetic Algorithms • Use the principle of natural selection • Create a range of possible solutions • Try out each of them • Choose and combine two of the better alternatives • Rinse and repeat as necessary
  • 16. Bringing in DevOps • DevOps has data that can be used to train neural networks • Health of the application • Trends in application traffic and responsiveness • Application failure
  • 17. Machine Learning Helps DevOps • Decisions are complex • Why is the CPU maxed? • What is causing disk thrashing? • Why did the network slow? • Why did the application fail? • Data is massive • Potentially thousands of data points a day
  • 18. How Good Are Decisions? • Expert versus machine • Given the same data • In many domains they tie • With additional data, the human can be better • But machine learning will get better • But only as good as the data
  • 19. We Want to Do Two Things • Identify trends that may indicate future problems • Increasing response times • More page errors • Diagnose faults once they have happened • Why did the application fail? • How can we fix it as quickly as possible?
  • 20. Fixed Algorithms Work for Some Problems • Immediate performance and failure identification • Diagnosis of failures and performance issues • These are readily identifiable from known data
  • 21. Adaptive Systems Supplement These Tools • Predictions of future events • Performance • Availability • The target is moving • So we need current data to adjust the algorithms
  • 22. The Machine Helps the DevOps Expert • The machine learning app provides: • Early warning on possible performance issues and failures • Immediate notification of failure or impending failure • Trend analysis of data to predict unhealthy outcomes • The machine learning is an assistant • It can’t fix anything • It can’t necessarily identify the root cause
  • 23. What is the Goal? • We have many ways of monitoring • Many of them are represented at this conference • Each measures something a little different • Latency, response time, availability, network, DNS . . . • Too much data can be no better than no data at all • Machine learning can correlate across measurements • Focus to eliminate false positives
  • 24. Intelligent Systems Are Sometimes Wrong • The problem domain is ambiguous • There is no single “right” answer • “Close enough” is good • We don’t know quite why the software responds as it does • We can’t easily trace code paths
  • 25. Testing Machine Learning Systems • Have objective acceptance criteria • Test with new data • Don’t count on all results being accurate • Understand the architecture of the network as a part of the testing process • Communicate the level of confidence you have in the results to management and users
  • 26. A Cautionary Tale • All events are not created equal • AI systems treat events equally • A failure of a system during busy season is the same as any other • DevOps pros know otherwise • And can exert additional effort in response • And actually fix the problem • We can’t automate what we don’t understand • You need the human in the loop Confidential, Dynatrace LLC
  • 27. Conclusions • DevOps is a natural environment for machine learning systems • Any activity that generates data and requires a decision is fair game • Monitoring is low-hanging fruit • Fixed systems for failure and diagnosis, adaptive for trend analysis Confidential, Dynatrace LLC

Editor's Notes

  1. These types of software are becoming increasingly common, in areas such as ecommerce, public transportation, automotive, finance, and computer networks. They have the potential to make decisions given sufficiently well-defined inputs and goals. In some instances, they are characterized as artificial intelligence, in that they seemingly make decisions that were once the purview of a human user or operator.
  2. Most machine learning systems are based on neural networks. A neural network is a set of layered algorithms whose variables can be adjusted via a learning process. The learning process involves using known data inputs to create outputs that are then compared with known results. When the algorithms reflect the known results with the desired degree of accuracy, the algebraic coefficients are frozen and production code is generated. Today, this comprises much of what we understand as artificial intelligence.
  3. But there is a type of software where having a defined output is no longer the case. Actually, two types. One is machine learning systems. The second is predictive analytics, or adaptive systems.
  4. Have objective acceptance criteria. Know the amount of error you and your users are willing to accept. Test with new data. Once you’ve trained the network and frozen the architecture and coefficients, use fresh inputs and outputs to verify its accuracy. Don’t count on all results being accurate. That’s just the nature of the beast. And you may have to recommend throwing out the entire network architecture and starting over. Understand the architecture of the network as a part of the testing process. Few if any will be able to actually follow a set of inputs through the network of algorithms, but understanding how the network is constructed will help testers determine if another architecture might produce better results. Communicate the level of confidence you have in the results to management and users. Machine learning systems offer you the unique opportunity to describe confidence in statistical terms, so use them. One important thing to note is that the training data itself could well contain inaccuracies. In this case, because of measurement error, the recorded wind speed and direction could be off or ambiguous. In other cases, the cooling of the filament likely has some error in its measurement.