How to apply machine learning into your CI/CD pipeline

How to Apply Machine
Learning into Your
CI/CD Pipeline
Alon Weiss / Sealights

● Complexity
○ Architecture
○ Deployment & infrastructure
○ Technologies
○ Product → Service
○ Visibility
● Load
○ Automation → more tests, longer CI cycles
○ Shift left → more tests and build steps
○ Support more devices & platforms → More tests
○ Lack of resources = bottleneck
● Human resources
○ Hiring
○ Lack of time and expertise to research, plan & execute strategic engineering tasks
The Digital Transformation and DevOps

AI-Ops for CI/CD | Business Impact
“We see continued evidence that software speed, stability, and availability contribute to organizational
performance (including profitability, productivity, and customer satisfaction). Our highest performers
are twice as likely to meet or exceed their organizational performance goals.” - DORA 2019

AI-Ops for CI/CD | Business Impact

Alon Weiss
Chief Architect @SeaLights
alonweiss
alonw@sealights.io | www.sealights.io
The world’s #1 Software Quality Intelligence Platform that
fastens executions without sacrificing quality.

Research & Inspiration
Our daily needs - Releasing Faster and with Higher Quality
Continuous Delivery: Reliable Software Releases Through Build, Test, and
Deployment Automation / Jez Humble, David Farley
Accelerate / Nicole Forsgren, Jez Humble and Gene Kim
“Market Guide for AIOps Platforms”, “Artificial Intelligence for IT
Operations Delivers Improved Business Outcomes” by Gartner
“Take The Mystery Out Of AI for IT Operations (AIOps)” by Forrester

AI-Ops | Definition / Gartner
AIOps platforms combine big data and machine
learning functionality to support all primary IT
operations functions through the scalable ingestion and
analysis of the ever-increasing volume, variety and
velocity of data generated by IT.
The goal of the analytics effort is the discovery of
patterns — novel elements used to look forward in time
to predict possible incidents and emerging usage
profiles — and to look backward in time to determine
the root causes of current system behaviors .

AI-Ops | Definition / Forrester
Software that applies AI/ML or other advanced
analytics to business and operations data to make
correlations and provide prescriptive and
predictive answers in real time. These insights
produce real-time business performance KPIs,
allow teams to resolve incidents faster, and help
avoid incidents altogether.

AI-Ops | Use cases
● IT groups
○ Monitoring (IT Infrastructure, SREs)
■ excessive data usage
■ communication patterns
■ intrusion detection
○ Security
○ Release pipeline (the majority of this talk)
■ Release faster and with greater confidence in quality
● Non-IT
○ Demand / Order processing / Customer Satisfaction
○ Business Health
○ Marketing

AI-Ops | Usage Patterns
● Noise reduction (e.g. Alert Consolidation)
● Root Cause Analysis (e.g. during/after
incidents)
● Incident prevention (extrapolate future
events to prevent breakdowns)
● Anomaly detection beyond thresholds and
rule-based systems
● Initiating action using automation or
escalation

AI-Ops | Existing Tools
AI-Ops platforms
BigPanda
“Intelligent Automation for IT
Incident Management”
Moogsoft
“Purpose-Built AIOps Platform
for IT. Less Noise. Faster Fixes.
Shorter Outages."
APMs
NewRelic AI - NRAI launched
last week
Appdynamics - “Central
Nervous System”
Dynatrace - “Davis”
Splunk
Trends
“Current tools and processes aren’t
up to the task of monitoring today’s
apps and their underpinnings”
- Forrester
“AIOps tools show a “right-shift”
across the four stages of
monitoring — data acquisition,
aggregation, analysis and action —
with their core capabilities at data
aggregation and analysis. As the
technology matures further, users
will be able to leverage proactive
advice from the platform, enabling
the action stage. ”
- Gartner

Applying AI-Ops to the CI/CD pipeline

AI-Ops for CI/CD | Data Sources
● GitHub / GitLab / Bitbucket / Azure Devops
● JIRA / ServiceNow
● Jenkins / *Pipelines / others
● Test Stages - coverage per test, timing, pass/fail
● Static scanners - code quality, dependencies, automated code review
● APMs
● Logs - ELK, Splunk
● Calendars / IM status
● Provisioning - Terraform, Ansible, Puppet, Chef
● Salesforce

Release Pipeline Components (pre-production)
Pain Solution
Build Queues Important jobs wait time Prioritize or Parallelize
Build+Package Time Smaller components, parallelize
Tests (Unit tests, Integration,
Selenium ,e2e)
Time
Time-to-failure
Test failure RCA
Test Impact Analysis and Test Prioritization
Pinpoint root cause to developers
Infrastructure & Provisioning Limited resources
Cold starts
Provision ahead of time
Risk management Manual and mostly gut-based AI assisted (anomaly detection)
Monitoring after deployment Engineers are rarely involved Notify stakeholders and facilitate RCA

AI-Ops for CI/CD | Optimized Build Queues
● Goal:
○ Some jobs are more important than others. Prioritize the queues.
● Data Sources:
○ JIRA - issue types, relations, priority, severity, custom fields
○ VCS - commit history (also available on Jenkins) and change area/scope
○ Jenkins - Historic build graph and timing
○ Salesforce - customer account importance
● Machine Learning algorithm family:
Graph Neural Network to determine priority, Regression to determine build length
● Usage:
A CI plugin to determine and assign the priority, then sort the queue

AI-Ops for CI/CD | Smart Testing
● Goal:
○ Use Test Impact Analysis to run the minimal set of tests that are necessary
○ Fail fast
○ Eliminate Overlapping tests
● Data Sources:
○ Git/Build tool - build content and changes
○ Deep Coverage tools - per-test coverage
○ Classification
○ Statistical models
● Usage:
○ Find impacted tests by cross-referencing the changes and past test history
○ Deep integration with test runners so they run only those that are needed

AI-Ops for CI/CD | Flaky test detection
● Goal:
○ Isolate and weed out flaky tests
● Data Sources:
○ Deep Coverage tools - per-test coverage
○ Jenkins / Test Runners - Test results and history
○ Regression
● Usage:
○ If a test flips between passing/failing status without a detected change to explain it, it
may be flaky
○ Automatically quarantine tests, notify author

AI-Ops for CI/CD | Infrastructure provisioning
● Goal:
○ Prevent resource contention in CI/CD
○ Minimize wait time for resource provisioning
● Data Sources:
○ Jenkins / * Pipelines - job history and graph
○ Infrastructure - historic demand & usage, real-time capacity
○ IM / Calendar - Engineers availability
○ Predictive analytics (Regression)
● Usage:
○ Update autoscaler targets continuously based on real-time and historic capacity

AI-Ops for CI/CD | Smart Risk Management
● Goal:
○ Formally introduce the concept of Risk Management to the semi-automatic review
process
○ Find common risks
■ Untested code and configuration changes
■ Anomalies
● Test time
● Code paths
● Network usage pattern
● Git: Big changes & unusual churn, New contributors,
Self-merging PRs, Long-running PR
● Data Sources:
○ VCS
○ APMs, NPMDs tools

AI-Ops for CI/CD | Smart Risk Management
● Machine Learning algorithm family: Anomaly Detection
● Usage:
○ Evaluate risk using Anomaly Detection, 3rd party tools (e.g. GitPrime)
○ Put smart quality gates in place
○ Require manual approval only when risks are too high
○ Determine APM thresholds and rollout configuration according to risk

AI-Ops for CI/CD | Proactive Root Cause Analysis
● Goal:
○ Facilitate root cause analysis for production and test failures
● Data Sources:
○ ALMs - incidents, stack frames
○ Log collectors - capture messages, function names, stack frames
○ None! Good old text indexing
● Usage:
○ Cross reference the suspected code areas and logs with the commit history and
escalate to the contributors

AI-Ops Market | Market Direction and Forecast
Devops adoption is accelerating:
“The proportion of our highest performers has almost tripled, now comprising 20% of all teams.
This shows that excellence is possible - those that execute on key capabilities see the benefits.”
AIOps adoption in increasing, platforms are the next big thing:
“By 2020, approximately 50% of enterprises will actively use AIOps technologies ... up from 10%
today” - Gartner
“Over the next 5 years, wide-scope AIOps platforms will become the de facto form-factor for the
delivery of AIOps functionality as opposed to AIOps functionality embedded in a monitoring tool
like APM” - Gartner

Thank You!
Of course, we’re hiring! :-)
Questions, anyone?

How to apply machine learning into your CI/CD pipeline

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie How to apply machine learning into your CI/CD pipeline

Ähnlich wie How to apply machine learning into your CI/CD pipeline (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

How to apply machine learning into your CI/CD pipeline

Hinweis der Redaktion