SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
1) Yahoo Japan Corporation, 2) Japan Advanced Institute of Science and Technology (JAIST)
3) School of Computing, Tokyo Institute of Technology
Approximate QoS Rule Derivation
Based on Root Cause Analysis
for Cloud Computing
PRDC 2019
December 1-3, 2019, Kyoto, Japan
Satoshi Konno 1) 2) and Xavier Defago 3)
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Database Platforms in Yahoo! JAPAN
2
300+
Systems
100+
Services
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Major Services of Yahoo! JAPAN
3
3
Media
US
Search Video Answer Mail
JP
US
JP
Membership C2C Payment C2C EC B2C EC Local
Search Knowledge search MailNews
Yahoo AuctionPremium Loco
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Demand on OSS Database Platforms
4
300+
Systems
200+
Systems
MySQL 2000+
DBs
100+
Systems
Cassandra
30
70
60
40
Yahoo Japan
NoSQL
Team
RDB
Team
• Demand on developing autonomous recovery systems
• The number of nodes is increasing year by year.
• The human resources are limited.
4000+
Nodes
X : Autonomous Recovery
X : Autonomous Recovery
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Table of Contents
5
• Background and Related Work
• Proposal Autonomous Recovery Methods
(μQoS and Shape-Root)
• Evaluation Result
• Conclusion and Future Plans
Copyright 2019 Yahoo Japan Corporation. All Rights Reserved.
Background
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
In-memory
Monitoring Systems
Traditional (Storage)
Monitoring Systems
Monitoring Studies for Cloud Computing
7
X : Analysis
O : Aggregation
X : Root Cause
O : Analysis
Replacing
Tech Giant Public System OSS Type Capacity Period Legacy
2010 DataGarage × Distributed 4,000
nodes
- TableStore
2014 Atlas + Winston △ Centralized 2 billion
records
6 h Epic
2015 Gorilla △ Centralized 10 billion
records
26 h HBase
2016 Borgmon × Distributed +
Hierarchical
10,000
nodes
12 h
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
QoS Studies for Cloud Computing
8
[1] Abdelzahir Abdelmaboud, Dayang NA Jawawi, Imran Ghani, Abubakar Elsafi, and Barbara Kitchenham.
Quality of service approaches in cloud computing: A systematic mapping study.
Journal of Systems and Software, 101:159–179, 2015
Map of focus areas in research on QoS approaches in cloud computing [1] Distribution of primary studies by contribution type [1]
Models: Discusses concepts, makes
comparisons, explores relationships, identifies
challenges, or makes classifications.
Tools: Supports various aspects of
QoS approaches in cloud computing.
Methods: Presents a model, algorithm
or approaches describing the rules.
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
QoS Studies for Cloud Computing
9
[1] S Anithakumari and K Chandrasekaran. Monitoring and management of service level agreements in cloud computing. In
Cloud and Autonomic Computing (ICCAC), 2015 International Conference on, pages 204–207. IEEE, 2015.
X : Resource expansion based
on system failure without
finding the root cause
O : Resource extension based
on increased demand
Copyright 2019 Yahoo Japan Corporation. All Rights Reserved.
Methods
(μQos and Shape-Root)
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Overall Autonomous Recovery Sequence
μQoS: Reasoning Framework for Guaranteeing QoS
11
STEP1 STEP2 STEP3
Root Cause
Analysis
QoS Rule
Derivation
QoS
Monitoring
Internal In-Memory Time-Series Monitoring System (Foreman)
Expanding QoS Monitoring and Action Rules without Resource Expansion
STEP4
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
QoS Monitoring Rules and Actions
STEP1 : Separation of Monitoring Rules and Actions
12
Internal In-Memory Time-Series Monitoring System (Foreman)
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
STEP3 : μQoS Concept with Root Cause
13
Consumer
Provider
Root Cause Metrics
Service QoS Rule
(with No Action)
Unsatisfied Metrics
Operation QoS Rule
(with Recovery Action)
STEP1
STEP2
STEP3
Execute the μQoS Rule
STEP4
Generating a μQoS Rule
• Reliable Root Cause
• Fast Root Cause Analysis
Mandatory Requirements for
Fast Autonomous Recovery
Copyright 2019 Yahoo Japan Corporation. All Rights Reserved.
Root Cause Analysis
(Shape-Root)
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Root Cause Analysis Methods for Time-Series Data
15
• Correlation Analysis: Traditional
parametric statistics method
• Clustering Analysis: Grouping a set of
metrics in the same group
• Recent Studies: BigRoot, Gorilla, etc.
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Root Cause : Correlation Analysis
16
PPMCC
[1] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London,
58:240–242, 1895.
[2] Charles Loboz, Slawek Smyl, and Suman Nath. Datagarage: Warehousing massive performance data on commodity servers.
Proceedings of the VLDB Endowment, 3(1-2):1447–1458, 2010.
[3] AbdullahMueen,SumanNath,andJieLiu.Fastapproximatecorrelation for massive time-series data. In Proceedings of the 2010
ACM SIGMOD International Conference on Management of data, pages 171–182. ACM, 2010.
[4] Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. Gorilla: A
fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment, 8(12):1816–1827, 2015.
• Pearson product-moment correlation coefficient
• Traditional Parametric Statistics Method
• Some monitoring studies on Cloud computing
[2][3][4] denoted using the general correlation
algorithm, but these studies did not reveal how to
find the root causes using PPMCC in more detail.
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Root Cause : Clustering Analysis
17
k-Shape [1]
[1] John Paparrizos and Luis Gravano. k-Shape: Efficient and accurate clustering of time series. In Proceedings of the 2015
ACM SIGMOD International Conference on Management of Data, pages 1855–1870. ACM, 2015.
• Scalable shape-based clustering
algorithm for time-series data based
on k-means clustering
• Normalized version of the cross-
correlation is used for measuring
distances between metrics
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Root Cause : Recent Studies
18
BigRoots [1]
[1] Honggang Zhou, Yunchun Li, Hailong Yang, Jie Jia, and Wei Li. Bigroots: An effective approach for root-cause analysis of
stragglers in big data system. IEEE Access, 6:41966–41977, 2018.
• Root-cause analysis for
the underlying reasons for
stragglers.
• Analyzing the stragglers
using general metrics such
as shuffle read/write
bytes and JVM garbage
collection time, CPU, I/O,
and network.
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Root Cause : Shape-Root for μQoS
19
Shape-Root
• Developed with an emphasis on
precision and analysis speed to
identify reliable root causes
dynamically for μQoS
• Based on a shape based
algorithm, Dynamic Time
Warping (DTW), to measure the
metrics correlation distance
• Root-cause analysis for all
time-series metrics for
unsatisfied QoS metrics unlike
BigRoot
• Excluding descendant metrics
and confounding metric based
on the timestamps unlike PMCC
Copyright 2019 Yahoo Japan Corporation. All Rights Reserved.
Evaluation
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Purpose
21
• Q1: Effectiveness of μQoS and Shape-
Root in detecting candidate root
causes
• Q2: Correlation between analysis span
and precision for the time-series data
• Q3: Effectiveness of μQoS and Shape-
Root for autonomous recovery to real
services?
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Evaluation Environment
22
• Apache Cassandra v3.11.4:
Distributed NoSQL Database
• Yahoo! Cloud Serving Benchmark (YCSB) v0.15.0:
Benchmark Program for Distributed Databases
• Foreman v0.8.9:
Internal Distributed Monitoring System in Yahoo JAPAN
(11,754 type metrics in a 5 minute cycle)
Cassandra
Cassandra
Cassandra
Cassandra
Cassandra
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Root Cause Methods for μQoS
23
• Shape-Root: Our proposal root cause
method for μQoS
• PPMCC [1]: Standard correlation analysis
• k-Shape [2]: General time-series clustering
analysis algorithm
• BigRoot [3]: Root cause algorithm for
stragglers
[1] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London,
58:240–242, 1895.
[2] John Paparrizos and Luis Gravano. k-Shape: Efficient and accurate clustering of time series. In Proceedings of the 2015
ACM SIGMOD International Conference on Management of Data, pages 1855–1870. ACM, 2015.
[3] Honggang Zhou, Yunchun Li, Hailong Yang, Jie Jia, and Wei Li. Bigroots: An effective approach for root-cause analysis of
stragglers in big data system. IEEE Access, 6:41966–41977, 2018.
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Evaluation Metrics
24
• Evaluating the following metrics based on leave-one-
out cross-validation
• TP: Number of correct potential root causes
• FP: Number of wrong potential root causes
• FN: Number of not detected root causes
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Experiment 1: Detecting root causes for injected
faults
25
QoS Rules
Consumer
Bad ?
CPU Stress
(30min cycle)
YCSB
Read Heavy
Workload
Rule11 was unsatisfied
with CPU load
(30min cycle)
Best
Good
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Experiment 2: Detecting root causes in a real
system
26
Consumer
QoS Rules
x
x
x
x
x x
YCSB
Anti-Pattern
Workload
SSTables of Tombstone
Good
Bad ?
Bad
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Experiment 3: Comparing effective analysis
period in a real system
27
x
Consumer
QoS Rules
YCSB
Anti-Pattern
Workload
Good Good
Bad
x x
Bad
Slow
Good
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Experiment 4: Autonomous Recovery
Effectiveness for a real system
28
Consumer
(Rule41)
Initial QoS Rules
5. Execute Rule43
3. Derivate Rule43
2. Root Cause Analysis for
Derivate for Rule41
1. Rule41 is Unsatisfied
4. Add Rule43and
Execute Rule43
YCSB
Anti-Pattern
Workload
Provider
(Rule42)
Copyright 2019 Yahoo Japan Corporation. All Rights Reserved.
Conclusion
Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved.
Summary and Future Plans
μQoS : Event-driven monitoring rule derivation method
based on case-based reasoning and root cause analysis
for autonomous recovery
• Good: μQoS have demonstrated the effectiveness in
the real system with high precision and real-time
root cause algorithm called Shape-Root.
• Bad: Oversimplified the acausal root cause exclusion
algorithm based on only the metric timestamp. In
complex real systems which has many QoS rules,
acausal μQoS may be executed.
The study focused only on root cause analysis for past
failures. As the next step, we currently plan to expand the
μQoS framework for future failure based on model-based
reasoning or anomaly detection for preventing potential
failures.
30
Copyright 2019 Yahoo Japan Corporation. All Rights Reserved.
Thank you

Weitere ähnliche Inhalte

Was ist angesagt?

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
2016 04-19 machine learning
2016 04-19 machine learning2016 04-19 machine learning
2016 04-19 machine learningMark Reynolds
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheonMark Reynolds
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven productionMark Reynolds
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and BlockchainKan Yuenyong
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Geoffrey Fox
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AIaimsnist
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS DatasetKan Yuenyong
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talkbattagline
 

Was ist angesagt? (20)

Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
2016 04-19 machine learning
2016 04-19 machine learning2016 04-19 machine learning
2016 04-19 machine learning
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon2016 03-16 digital energy luncheon
2016 03-16 digital energy luncheon
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream Clustering
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
2016 06-07 data driven production
2016 06-07 data driven production2016 06-07 data driven production
2016 06-07 data driven production
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Big Data
Big Data Big Data
Big Data
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Cri big data
Cri big dataCri big data
Cri big data
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AI
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talk
 

Ähnlich wie Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing | PRDC 2019

kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdfAkuhuruf
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataredpel dot com
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsIRJET Journal
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a reviewTELKOMNIKA JOURNAL
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computingabhijeetnawal
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...Rachel Doty
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET Journal
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jNeo4j
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAcscpconf
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data csandit
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things PayamBarnaghi
 
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...IRJET Journal
 
Association Rule Mining using RHadoop
Association Rule Mining using RHadoopAssociation Rule Mining using RHadoop
Association Rule Mining using RHadoopIRJET Journal
 
Ijgca july 11
Ijgca   july 11Ijgca   july 11
Ijgca july 11ijgca
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
A Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing EnvironmentA Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing EnvironmentCSCJournals
 

Ähnlich wie Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing | PRDC 2019 (20)

kambatla2014.pdf
kambatla2014.pdfkambatla2014.pdf
kambatla2014.pdf
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big data
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
10probs.ppt
10probs.ppt10probs.ppt
10probs.ppt
 
Data stream mining techniques: a review
Data stream mining techniques: a reviewData stream mining techniques: a review
Data stream mining techniques: a review
 
Introduction to Grid Computing
Introduction to Grid ComputingIntroduction to Grid Computing
Introduction to Grid Computing
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database Techniques
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
An Analytical Framework Of A Deployment Strategy For Cloud Computing Services...
 
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
IRJET- Classification of Pattern Storage System and Analysis of Online Shoppi...
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
 
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
IRJET-Open Curltm Cloud Computing Test Structure:Confederate Data Centers for...
 
Association Rule Mining using RHadoop
Association Rule Mining using RHadoopAssociation Rule Mining using RHadoop
Association Rule Mining using RHadoop
 
Ijgca july 11
Ijgca   july 11Ijgca   july 11
Ijgca july 11
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
A Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing EnvironmentA Study of Protocols for Grid Computing Environment
A Study of Protocols for Grid Computing Environment
 

Kürzlich hochgeladen

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 

Kürzlich hochgeladen (11)

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 

Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing | PRDC 2019

  • 1. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. 1) Yahoo Japan Corporation, 2) Japan Advanced Institute of Science and Technology (JAIST) 3) School of Computing, Tokyo Institute of Technology Approximate QoS Rule Derivation Based on Root Cause Analysis for Cloud Computing PRDC 2019 December 1-3, 2019, Kyoto, Japan Satoshi Konno 1) 2) and Xavier Defago 3)
  • 2. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Database Platforms in Yahoo! JAPAN 2 300+ Systems 100+ Services
  • 3. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Major Services of Yahoo! JAPAN 3 3 Media US Search Video Answer Mail JP US JP Membership C2C Payment C2C EC B2C EC Local Search Knowledge search MailNews Yahoo AuctionPremium Loco
  • 4. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Demand on OSS Database Platforms 4 300+ Systems 200+ Systems MySQL 2000+ DBs 100+ Systems Cassandra 30 70 60 40 Yahoo Japan NoSQL Team RDB Team • Demand on developing autonomous recovery systems • The number of nodes is increasing year by year. • The human resources are limited. 4000+ Nodes X : Autonomous Recovery X : Autonomous Recovery
  • 5. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Table of Contents 5 • Background and Related Work • Proposal Autonomous Recovery Methods (μQoS and Shape-Root) • Evaluation Result • Conclusion and Future Plans
  • 6. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Background
  • 7. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. In-memory Monitoring Systems Traditional (Storage) Monitoring Systems Monitoring Studies for Cloud Computing 7 X : Analysis O : Aggregation X : Root Cause O : Analysis Replacing Tech Giant Public System OSS Type Capacity Period Legacy 2010 DataGarage × Distributed 4,000 nodes - TableStore 2014 Atlas + Winston △ Centralized 2 billion records 6 h Epic 2015 Gorilla △ Centralized 10 billion records 26 h HBase 2016 Borgmon × Distributed + Hierarchical 10,000 nodes 12 h
  • 8. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. QoS Studies for Cloud Computing 8 [1] Abdelzahir Abdelmaboud, Dayang NA Jawawi, Imran Ghani, Abubakar Elsafi, and Barbara Kitchenham. Quality of service approaches in cloud computing: A systematic mapping study. Journal of Systems and Software, 101:159–179, 2015 Map of focus areas in research on QoS approaches in cloud computing [1] Distribution of primary studies by contribution type [1] Models: Discusses concepts, makes comparisons, explores relationships, identifies challenges, or makes classifications. Tools: Supports various aspects of QoS approaches in cloud computing. Methods: Presents a model, algorithm or approaches describing the rules.
  • 9. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. QoS Studies for Cloud Computing 9 [1] S Anithakumari and K Chandrasekaran. Monitoring and management of service level agreements in cloud computing. In Cloud and Autonomic Computing (ICCAC), 2015 International Conference on, pages 204–207. IEEE, 2015. X : Resource expansion based on system failure without finding the root cause O : Resource extension based on increased demand
  • 10. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Methods (μQos and Shape-Root)
  • 11. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Overall Autonomous Recovery Sequence μQoS: Reasoning Framework for Guaranteeing QoS 11 STEP1 STEP2 STEP3 Root Cause Analysis QoS Rule Derivation QoS Monitoring Internal In-Memory Time-Series Monitoring System (Foreman) Expanding QoS Monitoring and Action Rules without Resource Expansion STEP4
  • 12. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. QoS Monitoring Rules and Actions STEP1 : Separation of Monitoring Rules and Actions 12 Internal In-Memory Time-Series Monitoring System (Foreman)
  • 13. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. STEP3 : μQoS Concept with Root Cause 13 Consumer Provider Root Cause Metrics Service QoS Rule (with No Action) Unsatisfied Metrics Operation QoS Rule (with Recovery Action) STEP1 STEP2 STEP3 Execute the μQoS Rule STEP4 Generating a μQoS Rule • Reliable Root Cause • Fast Root Cause Analysis Mandatory Requirements for Fast Autonomous Recovery
  • 14. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause Analysis (Shape-Root)
  • 15. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause Analysis Methods for Time-Series Data 15 • Correlation Analysis: Traditional parametric statistics method • Clustering Analysis: Grouping a set of metrics in the same group • Recent Studies: BigRoot, Gorilla, etc.
  • 16. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Correlation Analysis 16 PPMCC [1] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242, 1895. [2] Charles Loboz, Slawek Smyl, and Suman Nath. Datagarage: Warehousing massive performance data on commodity servers. Proceedings of the VLDB Endowment, 3(1-2):1447–1458, 2010. [3] AbdullahMueen,SumanNath,andJieLiu.Fastapproximatecorrelation for massive time-series data. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 171–182. ACM, 2010. [4] Tuomas Pelkonen, Scott Franklin, Justin Teller, Paul Cavallaro, Qi Huang, Justin Meza, and Kaushik Veeraraghavan. Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment, 8(12):1816–1827, 2015. • Pearson product-moment correlation coefficient • Traditional Parametric Statistics Method • Some monitoring studies on Cloud computing [2][3][4] denoted using the general correlation algorithm, but these studies did not reveal how to find the root causes using PPMCC in more detail.
  • 17. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Clustering Analysis 17 k-Shape [1] [1] John Paparrizos and Luis Gravano. k-Shape: Efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1855–1870. ACM, 2015. • Scalable shape-based clustering algorithm for time-series data based on k-means clustering • Normalized version of the cross- correlation is used for measuring distances between metrics
  • 18. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Recent Studies 18 BigRoots [1] [1] Honggang Zhou, Yunchun Li, Hailong Yang, Jie Jia, and Wei Li. Bigroots: An effective approach for root-cause analysis of stragglers in big data system. IEEE Access, 6:41966–41977, 2018. • Root-cause analysis for the underlying reasons for stragglers. • Analyzing the stragglers using general metrics such as shuffle read/write bytes and JVM garbage collection time, CPU, I/O, and network.
  • 19. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause : Shape-Root for μQoS 19 Shape-Root • Developed with an emphasis on precision and analysis speed to identify reliable root causes dynamically for μQoS • Based on a shape based algorithm, Dynamic Time Warping (DTW), to measure the metrics correlation distance • Root-cause analysis for all time-series metrics for unsatisfied QoS metrics unlike BigRoot • Excluding descendant metrics and confounding metric based on the timestamps unlike PMCC
  • 20. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Evaluation
  • 21. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Purpose 21 • Q1: Effectiveness of μQoS and Shape- Root in detecting candidate root causes • Q2: Correlation between analysis span and precision for the time-series data • Q3: Effectiveness of μQoS and Shape- Root for autonomous recovery to real services?
  • 22. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Evaluation Environment 22 • Apache Cassandra v3.11.4: Distributed NoSQL Database • Yahoo! Cloud Serving Benchmark (YCSB) v0.15.0: Benchmark Program for Distributed Databases • Foreman v0.8.9: Internal Distributed Monitoring System in Yahoo JAPAN (11,754 type metrics in a 5 minute cycle) Cassandra Cassandra Cassandra Cassandra Cassandra
  • 23. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Root Cause Methods for μQoS 23 • Shape-Root: Our proposal root cause method for μQoS • PPMCC [1]: Standard correlation analysis • k-Shape [2]: General time-series clustering analysis algorithm • BigRoot [3]: Root cause algorithm for stragglers [1] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:240–242, 1895. [2] John Paparrizos and Luis Gravano. k-Shape: Efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1855–1870. ACM, 2015. [3] Honggang Zhou, Yunchun Li, Hailong Yang, Jie Jia, and Wei Li. Bigroots: An effective approach for root-cause analysis of stragglers in big data system. IEEE Access, 6:41966–41977, 2018.
  • 24. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Evaluation Metrics 24 • Evaluating the following metrics based on leave-one- out cross-validation • TP: Number of correct potential root causes • FP: Number of wrong potential root causes • FN: Number of not detected root causes
  • 25. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 1: Detecting root causes for injected faults 25 QoS Rules Consumer Bad ? CPU Stress (30min cycle) YCSB Read Heavy Workload Rule11 was unsatisfied with CPU load (30min cycle) Best Good
  • 26. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 2: Detecting root causes in a real system 26 Consumer QoS Rules x x x x x x YCSB Anti-Pattern Workload SSTables of Tombstone Good Bad ? Bad
  • 27. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 3: Comparing effective analysis period in a real system 27 x Consumer QoS Rules YCSB Anti-Pattern Workload Good Good Bad x x Bad Slow Good
  • 28. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Experiment 4: Autonomous Recovery Effectiveness for a real system 28 Consumer (Rule41) Initial QoS Rules 5. Execute Rule43 3. Derivate Rule43 2. Root Cause Analysis for Derivate for Rule41 1. Rule41 is Unsatisfied 4. Add Rule43and Execute Rule43 YCSB Anti-Pattern Workload Provider (Rule42)
  • 29. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Conclusion
  • 30. Copyright ©2019 Yahoo Japan Corporation. All Rights Reserved. Summary and Future Plans μQoS : Event-driven monitoring rule derivation method based on case-based reasoning and root cause analysis for autonomous recovery • Good: μQoS have demonstrated the effectiveness in the real system with high precision and real-time root cause algorithm called Shape-Root. • Bad: Oversimplified the acausal root cause exclusion algorithm based on only the metric timestamp. In complex real systems which has many QoS rules, acausal μQoS may be executed. The study focused only on root cause analysis for past failures. As the next step, we currently plan to expand the μQoS framework for future failure based on model-based reasoning or anomaly detection for preventing potential failures. 30
  • 31. Copyright 2019 Yahoo Japan Corporation. All Rights Reserved. Thank you