AIOps has the promise to create hyper-efficiency within DevOps teams as they struggle with the diversity, complexity, and rate of change across the entire stack.
DevOps teams working with big data face unique challenges due to the complexity and diversity of the components that comprise the big data stack. At the same time, AIOps is maturing to the point of creating true efficiencies among these DevOps teams as they struggle against the diversity, complexity, dynamic behavior and rate of change across the entire stack.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Doing DevOps for Big Data? What You Need to Know About AIOps
1. What DevOps Need to Know About
AIOPs
Bala Venkatrao
VP, Products at Unravel Data
January 17, 2019
2. Polling questions
(1) What are your most common big data application challenges?
(select all that apply)
- Performance tuning for slow applications
- Root cause analysis for application failures
- Establishing meaningful SLAs
- Detecting runaway queries
(2) What are your most common operational challenges for big data clusters ?
(select all that apply)
- Chargeback/Showback for multi-tenant clusters
- Visibility into resource utilization
- Job and workload management
- Cluster tuning
- Capacity Planning
(3) What tools do you use today for troubleshooting and tuning your big data applications?
(select all that apply)
- Cloudera Manager/Apache Ambari
- YARN/Spark WebUI
- Dr. Elephant or other open source tools
- Log Mgmt tools: Splunk, ELK etc.
2
3. Challenge: Operationalizing Modern Data Applications
What does it mean for new
data-driven applications
and analytics to be
enterprise grade?
3
4. 4
Examples of Big Data Apps: ETL, Analytics, Machine Learning, IoT, etc.
Typical Big Data Architecture
RDBMS
SOCIAL
SENSOR
MACHINE
ERP
MOBILE
Data Sources
ModernDataPipelinesareVariedandComplex
ETL DATA
PIPELINE
STREAM
PROCESSING
RAW
DATA
COMPUTED
DATA
MESSAGING
RESULT STORE
QUERY DATA
REPORT
ML APPS
IoT APPS
ANALYTICS
B.I.
ALERTS
SERVICES
Data ConsumerReal-time / Batch Process Result StoreData Collection
5. Impact of poor performance and failures in the data pipeline
Low
Productivity
Sub-Optimized
Resources
Lack of
Reliability
5
6. Cascading Problems Impact Applications and Operations
LACK OF SINGLE,
CORRELATED VIEW
OUT OF CONTROL COSTS.
POORLY UTILIZED INFRA.
REACTIVE & SLOW
TO FIX PROBLEMS
APPLICATIONS
OPERATIONS
6
8. DevOps and AIOps
As big data adoption grows, the ability to manually intervene for
hundreds of jobs running on thousands of nodes becomes problematic
8
… Need for an AI Powered Application Performance Management (APM) for Big Data
9. Essential Elements of an AIOps Solution for Big Data APM
• Data Collection and Correlation
• Observe and collect all relevant data
• Operational Data Model
• AI-assisted monitoring, troubleshooting, tuning, and
managing requires a data model
• Analytics
• Statistical analysis – correlate, classify, extrapolate
from operational data
• Predictive/Prescriptive analytics – forecasting and
recommendations for capacity
• Pattern and anomaly detection, root-cause analysis
• Context, topology and coded expertise
• Automation
• Auto-tuning of applications and resources
• Cluster load balancing and job scheduling
• Autonomous response to alerts and failures
Data Collection
and Correlation
Modern Data
Apps and Stack Data Model Analytics Automation
Statistical
Predictive/
Prescriptive
Anomaly
Detection
Context/
Topology
Auto-tuning
Cluster
Operations
Resource
Management
Autonomous
Remediation
9
10. Without AI, Big Data APM is a manual, logistical challenge
One complete correlated view
with built-in AI and ML.
Multiple tools, no complete
view, no intelligence.
Big Data APM
Without AI
AI-Powered
Big Data APM
10
11. Unravel: First AIOps Solution for Big Data APM
Full-stack, Intelligent, Autonomous
11
12. AIOps Use Cases for Unravel
Automated Cloud Cost Management
• Optimize cost by right-sizing cloud
images
• Optimize cost by choosing the optimal
price plan
Automated Workload Management
• Eliminate CPU, Memory, Network I/O and
Disk I/O contention
• Correctly size VM’s and Cloud Images
• Place VM’s in the best Hosts and Clusters
Automated Event Management
• De-duplicate events
• Support a collaborative (DevOps)
problem resolution process
Automated Performance Optimization
and Remediation
• Automatically learn the performance
characteristics apps and supporting stack
• Automatically optimize for a chosen KPI
(performance, efficiency)
12
13. Unravel Applies Machine Learning (ML) at various
levels
Error Views &
Analysis
Tuning Recommendation
Application
Management
Automated Tuning
13
15. A single pane of glass
for application &
operations management
Anomaly detection
to rapidly detect &
diagnose unpredictable
behavior
Proactive alerting
& remediation
of cluster/SLA problems
caused by applications
Automatic root
cause analysis
of Workflow that missed
SLA
Intelligent tuning
to make Yarn (Spark, Hive)
applications faster &
resource efficient
Unravel AIOps Demo
16. Before Unravel: Global 200 Financial Services Company
Complex Infrastructure
Landscape
Debugging Performance
Problems is a Challenge
Out of Control Costs
Sub-optimal Capacity
Management
Missing Insights on Data
Operations
Ineffective Alerting and
Automatic Actions
100+ projects
5,000+ jobs/day
600+ users globally
3PB+ of data
>$1m spent on un-utilized storage
>5 different interfaces for job monitoring
>10 different logs for debugging a single
workflow
1-2 weeks to determine root
cause for performance issues
80% of the datasets can be candidates for
lower cost storage
99% of all the current alerting cannot be co-
related with performance issues
Customer Case Study
16
17. Complex Infrastructure
Landscape
Debugging Performance
Problems is a Challenge
Out of Control Costs
Sub-optimal Capacity
Management
Missing Insights on Data
Operations
Ineffective Alerting and
Automatic Actions
After Unravel: Global 200 Financial Services Company
Customer Case Study
17
Scale to unlimited # of users, apps,
data, projects
1 interface for job or workflow
monitoring
Reduce troubleshooting time by
98%
Maximize resource utilization Save 60% on resource cost 70% reduction in support tickets
20. Live Q&A questions
1. Does Unravel support big data workloads in the cloud?
2. I am planning to migrate from an on-premises installation to the
cloud. Can Unravel help with that?
3. Does Unravel do more than monitoring?
20
21. Thank You
Free Full Feature Trial on Amazon EMR, Microsoft Azure
https://unraveldata.com/free-trial/
https://unraveldata.com/