SlideShare ist ein Scribd-Unternehmen logo
1 von 16
NICTA Copyright 2012 From imagination to impact
Cloud API Issues: an Empirical
Study and Impact
Qinghua Lu, Liming Zhu, Len Bass, Xiwei Xu,
Zhanwen Li, Hiroshi Wada
Software Systems Research Group, NICTA
QoSA13, Vancouver
Slides at: http://www.slideshare.net/LimingZhu/
NICTA Copyright 2012 From imagination to impact
Motivation
• Cloud applications fail due to operation issues
– Gartner reports: 80% of outage caused by operations
• People/Process: replication/failover, auto-scaling, upgrade…
– Lessons from our own cloud DR product: Yuruware.com
– DevOps movement
• Operational causes of failures
– Infrastructure and processes, but,
– Most things are done through infrastructure API
• Highly dependable cloud applications require
– Architecting for not just the software but also its operation (thru API)
– Architecting for indirect control (thru API)
– Better understanding of Cloud API Issues
• reliability, performance, nature of failures and faults 2
NICTA Copyright 2012 From imagination to impact
Main Contributions
• Empirical study of cloud infrastructure API issues
– 922 failure/fault cases from Amazon EC2 forums (2010 to 2012)
• Around five most used API calls
• Fault analysis supplemented by other sources
– Classified the API failures and faults (causes of failures)
• Using the classic dependable computing taxonomy (Avizienis,04)
• Failures: content, late timing, halt, erratic
• Faults: development, physical, interaction
• Impact analysis through an initial proposal for tolerating
cloud API failures/faults
– Suggestions for tolerating content failures
– 11 patterns for tolerating timing failures
3
NICTA Copyright 2012 From imagination to impact
Some Empirical Findings
• Majority (60%) of the cases of API failures are related to stuck API
calls or unresponsive API calls
• 19% of the cases are related to the output issues of API calls
– Error messages, missing/wrong/unexpected contents
• 12% of the cases are about slow responsive API calls
• 9% cases are related to API calls that
– were pending for a certain time and then returned to the original state
without informing the caller properly
– were reported to be successful first but failed later
4
NICTA Copyright 2012 From imagination to impact
Methodology
5
• Amazon: EC2 forums and outage reports
• Netflix: technical blogs and GitHub OSS projects
• Yuruware.com: disaster recovery product which heavily relies on cloud
infrastructure APIs
NICTA Copyright 2012 From imagination to impact
Data Collected from Amazon EC2 Forum
6
Searched keywords and number of returned records
API of Interests Number of records
from inception to 2012
Number of records
from 2010 to 2012
describe instance 283 150
start instance 227 204
stop instance 349 348
detach volume 235 203
associate elastic IP 264 204
Total 1358 1109
Case type, number and percentage in the found cases
Case type Case number Percentage of all cases
from 2010-2012
API failures 922 83%
Enquiries 125 11%
API enhancements 62 6%
NICTA Copyright 2012 From imagination to impact
Classification of API Failures
7
Fault -> Error -> Failure
Failure: deviation from correct
service (external visible)
Error: internal erroneous state
Fault: adjudicated or hypothesized
causes of a failure
[13] A. Avizienis, J. C. Laprie, B. Randell, and C. Landwehr, "Basic concepts and
taxonomy of dependable and secure computing," Dependable and Secure
Computing, IEEE Transactions on, vol. 1, pp. 11-33, 2004.
NICTA Copyright 2012 From imagination to impact
Classification of API Failures
• Content failures (19%)
– With error messages; missing/wrong/unexpected content
• 61% of the times users understood the causes/solutions from the error message
• 39% of the times users could not pinpoint the causes from the error message
8
Posted on Jan 10, 2012 5:42 AM
Symptom: When a user tried to start an instance, the operation failed with an unclear error
message.
Error message: State Transition Reason - Server.InternalError: Internal error on launch
Root cause: Unknown.
Solution: AWS engineers advised detaching the EBS volume from the instance and attaching it to
another running instance.
Posted on Jun 14, 2012 9:57PM
Symptom: Failed API calls and receiving Request limit exceeded error message.
Error message: Client.RequestLimitExceeded: Request limit exceeded
Root cause: API calls exceeded limit.
Solution: N/A. There is no official information on the limit or the time span on which the limit is
calculated or suggested wait time.
Failed call where the error message is unclear.
Failed call where the error message is clear.
NICTA Copyright 2012 From imagination to impact
Classification of API Failures
• Late timing failures (12%)
– the arrival time of the delivered information deviates from the
expected time but they do eventually arrive
9
A late timing failure example.
Posted on Aug 27, 2012 11:57 AM
Symptom: It took 16 minutes for an instance
to stop.
Root cause: n/a.
Solution: The AWS engineer advised to try
“force stop” twice if this happens next time.
NICTA Copyright 2012 From imagination to impact
Classification of API Failures
• Halt failures (60%)
– The external state becomes constant.
– Most frequent failures!
10
A general halt failure example.
Posted on Jun 27, 2012 12:04 AM
Symptom: A user reported that the instance is stuck at stopping and “force stop” would not help.
Root cause: n/a.
Solution: The AWS engineer stopped the instance for the user on the AWS side (with some side
effect).
A silent failure example.
Posted on Oct 23, 2012 7:45 AM
Symptom: An instance was not accessible and the user could not stop/start it or create a snapshot
Root cause: AWS outage.
Solution: The AWS engineer advised that the user must launch a replacement instance from a pre-
existing backup (EBS AMI). Attempts to stop an inaccessible instance will likely result in an instance
becoming stuck in the stopping state. Customers that do not have a known good backup must wait
for the issue to be resolved for their instance connectivity to be restored.
NICTA Copyright 2012 From imagination to impact
Classification of API Failures
• Erratic failures (35%)
– When the delivered service is unpredictable: Two subtypes:
• the call is pending for a certain time and then returns to the original state
• the call is successfully executed first but failed eventually
11
Two erratic failure examples.
Posted on Feb 1, 2012 8:15 AM
Symptom: A user associated an elastic IP with an instance and could SSH into the instance with the
elastic IP. After a few minutes, the elastic IP was silently disassociated from the instance.
Root cause: An issue with the underlying host.
Solution: The AWS engineer advised that the quickest fix was to stop and then start the instance to
relocate to a different host.
Posted on Jan 14, 2011 1:43 PM
Symptom: A user tried to start the instance several times. It indicated that the status is pending and
it goes back to stop.
Root cause: n/a.
Solution: The AWS engineer returned the user’s EBS volume to the available state and believed this
would resolve the user’s problem.
NICTA Copyright 2012 From imagination to impact
Classifying of Faults (Causes of Failures)
• Development faults – software bugs
– User workarounds exist but may break after bug fixing
• Physical faults
– Stopping/Starting to move to a new physical machine but
problematic stopping
– Future work: classifying using virtual resource characteristics
• Interaction faults
– Misconfiguration faults count for 30%
• Accidental & purposeful misconfiguration
– Purposeful misconfiguration
• lack of knowledge (subjective uncertainty vs. stochastic uncertainty)
• Configuration and operation impact on availability 1,2
1. X. Xu, Q. Lu, L. Zhu, et al., "Availability Analysis of In-Cloud Applications," in ISARCS13 (11:30
tomorrow)
2. Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application
Deployment Decisions for Availability," in IEEE Cloud 2013 12
NICTA Copyright 2012 From imagination to impact
Tolerating API Failures/Faults
13
• Perspective
– cloud consumer and application oriented
– limited visibility: e.g. may not know the root cause
– indirect control: e.g. solutions are thru APIs as well
• Different failures/faults require different approaches
– Failure/Fault classification dependent
– Suggestions, patterns and ad-hoc use of failure/fault characteristics:
• Content failure: alternative sources for content, defensive programming…
• Late timing failures: API call life cycle driven
NICTA Copyright 2012 From imagination to impact
API Call Life Cycle Driven Patterns
14
NICTA Copyright 2012 From imagination to impact
Pattern Examples
• Faster forced fail/complete
– force-fail-r or force-fail-s
• Netflix Hystrix: fail fast based on 95-99 percentile delay
– force-complete-r
• Yuruware: ignore some “describe” API calls
• Hedged requests or more sophisticated retry
– continue-request
• Common: send the same request to 2 places and cancel the slow one
– reallocate or reallocate-s
• Yuruware: attach the to-be-moved volume to different mover instances
after early mover failures
15
NICTA Copyright 2012 From imagination to impact 16
Conclusion and Future Work
• Empirical study of cloud infrastructure API issues
– Analysed & classified 922 failure/faults from Amazon EC2 forums
• Inform better architecting for operations (i.e. operator as a stakeholder)
– Future work (completed)
• Expanded to more cases from other sources (2087 issues)
• Proposed a new scheme for classifying faults
• Tolerating cloud API failures/faults
– Patterns for tolerating different types of API failures/faults
– Future work (ongoing)
• More actionable mechanisms/patterns and their implementation
• Use the characteristics of the faults and failures
– for smarter recovery and error diagnosis during operation
• What we need: more real world operation logs and collaborators
{Liming.Zhu, Len.Bass}@nicta.com.au
Slides available at http://www.slideshare.net/LimingZhu/

Weitere ähnliche Inhalte

Was ist angesagt?

IBM webinar Profesia su Requirements Quality assistant
IBM webinar Profesia su Requirements Quality assistantIBM webinar Profesia su Requirements Quality assistant
IBM webinar Profesia su Requirements Quality assistantProfesia Srl, Lynx Group
 
Software Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous DeliverySoftware Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous DeliveryEberhard Wolff
 
Development Has Moved On: Test data needs to catch up with containers
Development Has Moved On: Test data needs to catch up with containersDevelopment Has Moved On: Test data needs to catch up with containers
Development Has Moved On: Test data needs to catch up with containersCuriosity Software Ireland
 
Riverbed Software Defined IT Survey
Riverbed Software Defined IT SurveyRiverbed Software Defined IT Survey
Riverbed Software Defined IT SurveyRiverbed Technology
 
Detect and Fix Performance Problems Faster
Detect and Fix Performance Problems FasterDetect and Fix Performance Problems Faster
Detect and Fix Performance Problems FasterRiverbed Technology
 
Automated Testing Using Selenium
Automated Testing Using SeleniumAutomated Testing Using Selenium
Automated Testing Using SeleniumTechWell
 
2012 11-30 deep sec insecurity over time
2012 11-30 deep sec insecurity over time2012 11-30 deep sec insecurity over time
2012 11-30 deep sec insecurity over timeAlexey Kachalin
 
Static Application Security Testing Strategies for Automation and Continuous ...
Static Application Security Testing Strategies for Automation and Continuous ...Static Application Security Testing Strategies for Automation and Continuous ...
Static Application Security Testing Strategies for Automation and Continuous ...Kevin Fealey
 
Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?
Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?
Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?Denim Group
 
Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Matt Tesauro
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSADenim Group
 
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...Deborah Schalm
 
Kevin Slade - CV
Kevin Slade - CVKevin Slade - CV
Kevin Slade - CVKevin Slade
 
Continuous delivery mobile application development
Continuous delivery mobile application developmentContinuous delivery mobile application development
Continuous delivery mobile application developmentThoughtworks
 
Diving Deeper into DevOps Deployments
Diving Deeper into DevOps DeploymentsDiving Deeper into DevOps Deployments
Diving Deeper into DevOps DeploymentsJules Pierre-Louis
 
PLM World Conference 2007
PLM World Conference 2007PLM World Conference 2007
PLM World Conference 2007Matt Tremmel
 
Is BDD Worth It? Considerations for Advanced Test Automation
Is BDD Worth It? Considerations for Advanced Test AutomationIs BDD Worth It? Considerations for Advanced Test Automation
Is BDD Worth It? Considerations for Advanced Test AutomationPerfecto by Perforce
 
we45 DEFCON Workshop - Building AppSec Automation with Python
we45 DEFCON Workshop - Building AppSec Automation with Pythonwe45 DEFCON Workshop - Building AppSec Automation with Python
we45 DEFCON Workshop - Building AppSec Automation with PythonAbhay Bhargav
 
SONY - Process as Code: Continuous Delivery of a CD Pipeline
SONY - Process as Code: Continuous Delivery of a CD PipelineSONY - Process as Code: Continuous Delivery of a CD Pipeline
SONY - Process as Code: Continuous Delivery of a CD PipelineDevOps Enterprise Summit
 

Was ist angesagt? (20)

CV-Gunawan Pujasandi
CV-Gunawan PujasandiCV-Gunawan Pujasandi
CV-Gunawan Pujasandi
 
IBM webinar Profesia su Requirements Quality assistant
IBM webinar Profesia su Requirements Quality assistantIBM webinar Profesia su Requirements Quality assistant
IBM webinar Profesia su Requirements Quality assistant
 
Software Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous DeliverySoftware Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous Delivery
 
Development Has Moved On: Test data needs to catch up with containers
Development Has Moved On: Test data needs to catch up with containersDevelopment Has Moved On: Test data needs to catch up with containers
Development Has Moved On: Test data needs to catch up with containers
 
Riverbed Software Defined IT Survey
Riverbed Software Defined IT SurveyRiverbed Software Defined IT Survey
Riverbed Software Defined IT Survey
 
Detect and Fix Performance Problems Faster
Detect and Fix Performance Problems FasterDetect and Fix Performance Problems Faster
Detect and Fix Performance Problems Faster
 
Automated Testing Using Selenium
Automated Testing Using SeleniumAutomated Testing Using Selenium
Automated Testing Using Selenium
 
2012 11-30 deep sec insecurity over time
2012 11-30 deep sec insecurity over time2012 11-30 deep sec insecurity over time
2012 11-30 deep sec insecurity over time
 
Static Application Security Testing Strategies for Automation and Continuous ...
Static Application Security Testing Strategies for Automation and Continuous ...Static Application Security Testing Strategies for Automation and Continuous ...
Static Application Security Testing Strategies for Automation and Continuous ...
 
Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?
Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?
Remediation Statistics: What Does Fixing Application Vulnerabilities Cost?
 
Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016Taking AppSec to 11 - BSides Austin 2016
Taking AppSec to 11 - BSides Austin 2016
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
 
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...
 
Kevin Slade - CV
Kevin Slade - CVKevin Slade - CV
Kevin Slade - CV
 
Continuous delivery mobile application development
Continuous delivery mobile application developmentContinuous delivery mobile application development
Continuous delivery mobile application development
 
Diving Deeper into DevOps Deployments
Diving Deeper into DevOps DeploymentsDiving Deeper into DevOps Deployments
Diving Deeper into DevOps Deployments
 
PLM World Conference 2007
PLM World Conference 2007PLM World Conference 2007
PLM World Conference 2007
 
Is BDD Worth It? Considerations for Advanced Test Automation
Is BDD Worth It? Considerations for Advanced Test AutomationIs BDD Worth It? Considerations for Advanced Test Automation
Is BDD Worth It? Considerations for Advanced Test Automation
 
we45 DEFCON Workshop - Building AppSec Automation with Python
we45 DEFCON Workshop - Building AppSec Automation with Pythonwe45 DEFCON Workshop - Building AppSec Automation with Python
we45 DEFCON Workshop - Building AppSec Automation with Python
 
SONY - Process as Code: Continuous Delivery of a CD Pipeline
SONY - Process as Code: Continuous Delivery of a CD PipelineSONY - Process as Code: Continuous Delivery of a CD Pipeline
SONY - Process as Code: Continuous Delivery of a CD Pipeline
 

Andere mochten auch

Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Liming Zhu
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...Liming Zhu
 
A importancia do insta marketing para negócios - Instaby
A importancia do insta marketing para negócios - InstabyA importancia do insta marketing para negócios - Instaby
A importancia do insta marketing para negócios - InstabyFellipe Guimarães
 
Seo Omega Review
Seo Omega ReviewSeo Omega Review
Seo Omega ReviewSEOSugar
 
Bridging the Engagement Gap for Distance Students Through Telerobotics
Bridging the Engagement Gap for Distance Students Through TeleroboticsBridging the Engagement Gap for Distance Students Through Telerobotics
Bridging the Engagement Gap for Distance Students Through TeleroboticsMichael Griffith
 
мариинский театр
мариинский театрмариинский театр
мариинский театр'Helena Grigorjeva
 
Dependable Operations
Dependable OperationsDependable Operations
Dependable OperationsLiming Zhu
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Liming Zhu
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Liming Zhu
 
Facebook vs instagram - Fellipe Guimarães - Instaby
Facebook vs instagram - Fellipe Guimarães - InstabyFacebook vs instagram - Fellipe Guimarães - Instaby
Facebook vs instagram - Fellipe Guimarães - InstabyFellipe Guimarães
 
Hopitality accounting
Hopitality accountingHopitality accounting
Hopitality accountingDanilo Tan
 

Andere mochten auch (16)

Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
 
Presentación Perú
Presentación PerúPresentación Perú
Presentación Perú
 
A importancia do insta marketing para negócios - Instaby
A importancia do insta marketing para negócios - InstabyA importancia do insta marketing para negócios - Instaby
A importancia do insta marketing para negócios - Instaby
 
Collaborate plan workshop
Collaborate plan workshopCollaborate plan workshop
Collaborate plan workshop
 
Ppt
PptPpt
Ppt
 
Seo Omega Review
Seo Omega ReviewSeo Omega Review
Seo Omega Review
 
Bridging the Engagement Gap for Distance Students Through Telerobotics
Bridging the Engagement Gap for Distance Students Through TeleroboticsBridging the Engagement Gap for Distance Students Through Telerobotics
Bridging the Engagement Gap for Distance Students Through Telerobotics
 
мариинский театр
мариинский театрмариинский театр
мариинский театр
 
Dependable Operations
Dependable OperationsDependable Operations
Dependable Operations
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments
 
Eat your street
Eat your streetEat your street
Eat your street
 
Ppt
PptPpt
Ppt
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...
 
Facebook vs instagram - Fellipe Guimarães - Instaby
Facebook vs instagram - Fellipe Guimarães - InstabyFacebook vs instagram - Fellipe Guimarães - Instaby
Facebook vs instagram - Fellipe Guimarães - Instaby
 
Hopitality accounting
Hopitality accountingHopitality accounting
Hopitality accounting
 

Ähnlich wie Cloud API Issues: an Empirical Study and Impact

Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10CA Technologies
 
Automatic Undo for Cloud Management via AI Planning
Automatic Undo for Cloud Management via AI PlanningAutomatic Undo for Cloud Management via AI Planning
Automatic Undo for Cloud Management via AI PlanningHiroshi Wada
 
Presentation application server diagnostics
Presentation   application server diagnosticsPresentation   application server diagnostics
Presentation application server diagnosticsxKinAnx
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld
 
FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...
FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...
FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...SonjaChevre
 
Twelve Factor - Designing for Change
Twelve Factor - Designing for ChangeTwelve Factor - Designing for Change
Twelve Factor - Designing for ChangeEric Wyles
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinTechWell
 
Case Study: Verizon Wireless: Chasing the Yellow Before They Turn Red
Case Study: Verizon Wireless: Chasing the Yellow Before They Turn RedCase Study: Verizon Wireless: Chasing the Yellow Before They Turn Red
Case Study: Verizon Wireless: Chasing the Yellow Before They Turn RedCA Technologies
 
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012Amazon Web Services
 
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...AppDynamics
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
 
Error in hadoop
Error in hadoopError in hadoop
Error in hadoopLen Bass
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineMetrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineAndreas Grabner
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITRightScale
 
Exposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance ProblemsExposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance ProblemsRiverbed Technology
 
Performance Engineering Case Study V1.0
Performance Engineering Case Study    V1.0Performance Engineering Case Study    V1.0
Performance Engineering Case Study V1.0sambitgarnaik
 
Scalability using Node.js
Scalability using Node.jsScalability using Node.js
Scalability using Node.jsratankadam
 

Ähnlich wie Cloud API Issues: an Empirical Study and Impact (20)

Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10Real World Problem Solving Using Application Performance Management 10
Real World Problem Solving Using Application Performance Management 10
 
Automatic Undo for Cloud Management via AI Planning
Automatic Undo for Cloud Management via AI PlanningAutomatic Undo for Cloud Management via AI Planning
Automatic Undo for Cloud Management via AI Planning
 
ADF Performance Monitor
ADF Performance MonitorADF Performance Monitor
ADF Performance Monitor
 
Presentation application server diagnostics
Presentation   application server diagnosticsPresentation   application server diagnostics
Presentation application server diagnostics
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
 
Db trends final
Db trends   finalDb trends   final
Db trends final
 
FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...
FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...
FOSDEM 2024 - Deploy Fast, Without Breaking Things: Level Up APIOps With Open...
 
Twelve Factor - Designing for Change
Twelve Factor - Designing for ChangeTwelve Factor - Designing for Change
Twelve Factor - Designing for Change
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the Coin
 
Case Study: Verizon Wireless: Chasing the Yellow Before They Turn Red
Case Study: Verizon Wireless: Chasing the Yellow Before They Turn RedCase Study: Verizon Wireless: Chasing the Yellow Before They Turn Red
Case Study: Verizon Wireless: Chasing the Yellow Before They Turn Red
 
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
 
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 
Error in hadoop
Error in hadoopError in hadoop
Error in hadoop
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineMetrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
 
Mini-Track: Lessons from Public Cloud
Mini-Track: Lessons from Public CloudMini-Track: Lessons from Public Cloud
Mini-Track: Lessons from Public Cloud
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
 
Exposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance ProblemsExposing and Fixing Common App Performance Problems
Exposing and Fixing Common App Performance Problems
 
Performance Engineering Case Study V1.0
Performance Engineering Case Study    V1.0Performance Engineering Case Study    V1.0
Performance Engineering Case Study V1.0
 
Scalability using Node.js
Scalability using Node.jsScalability using Node.js
Scalability using Node.js
 

Mehr von Liming Zhu

AI Transformation A Clash with Human Expertise
AI TransformationA Clash with Human ExpertiseAI TransformationA Clash with Human Expertise
AI Transformation A Clash with Human ExpertiseLiming Zhu
 
Deciphering AI: Human Expertise in the Age of Evolving AI
Deciphering AI: Human Expertise in the Age of Evolving AIDeciphering AI: Human Expertise in the Age of Evolving AI
Deciphering AI: Human Expertise in the Age of Evolving AILiming Zhu
 
GenAI in Research with Responsible AI
GenAI in Researchwith Responsible AIGenAI in Researchwith Responsible AI
GenAI in Research with Responsible AILiming Zhu
 
AI Unveiled: From Current State to Future Frontiers
AI Unveiled: From Current State to Future FrontiersAI Unveiled: From Current State to Future Frontiers
AI Unveiled: From Current State to Future FrontiersLiming Zhu
 
Software Architecture for Foundation Model-Based Systems
Software Architecture for Foundation Model-Based SystemsSoftware Architecture for Foundation Model-Based Systems
Software Architecture for Foundation Model-Based SystemsLiming Zhu
 
AI Transformation
AI TransformationAI Transformation
AI TransformationLiming Zhu
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfLiming Zhu
 
Trends & Innovation in Cyber and Digitaltech
Trends & Innovationin Cyber and DigitaltechTrends & Innovationin Cyber and Digitaltech
Trends & Innovation in Cyber and DigitaltechLiming Zhu
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Liming Zhu
 
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AIICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AILiming Zhu
 
International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...Liming Zhu
 
RegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsRegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsLiming Zhu
 
Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Liming Zhu
 
Responsible AI The Australian Approach
Responsible AIThe Australian ApproachResponsible AIThe Australian Approach
Responsible AI The Australian ApproachLiming Zhu
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsLiming Zhu
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingLiming Zhu
 
Cyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsCyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsLiming Zhu
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinLiming Zhu
 
Responsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksResponsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksLiming Zhu
 

Mehr von Liming Zhu (19)

AI Transformation A Clash with Human Expertise
AI TransformationA Clash with Human ExpertiseAI TransformationA Clash with Human Expertise
AI Transformation A Clash with Human Expertise
 
Deciphering AI: Human Expertise in the Age of Evolving AI
Deciphering AI: Human Expertise in the Age of Evolving AIDeciphering AI: Human Expertise in the Age of Evolving AI
Deciphering AI: Human Expertise in the Age of Evolving AI
 
GenAI in Research with Responsible AI
GenAI in Researchwith Responsible AIGenAI in Researchwith Responsible AI
GenAI in Research with Responsible AI
 
AI Unveiled: From Current State to Future Frontiers
AI Unveiled: From Current State to Future FrontiersAI Unveiled: From Current State to Future Frontiers
AI Unveiled: From Current State to Future Frontiers
 
Software Architecture for Foundation Model-Based Systems
Software Architecture for Foundation Model-Based SystemsSoftware Architecture for Foundation Model-Based Systems
Software Architecture for Foundation Model-Based Systems
 
AI Transformation
AI TransformationAI Transformation
AI Transformation
 
Generative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdfGenerative-AI-in-enterprise-20230615.pdf
Generative-AI-in-enterprise-20230615.pdf
 
Trends & Innovation in Cyber and Digitaltech
Trends & Innovationin Cyber and DigitaltechTrends & Innovationin Cyber and Digitaltech
Trends & Innovation in Cyber and Digitaltech
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models
 
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AIICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
 
International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...
 
RegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsRegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and Lessons
 
Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61
 
Responsible AI The Australian Approach
Responsible AIThe Australian ApproachResponsible AIThe Australian Approach
Responsible AI The Australian Approach
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based Systems
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of Everything
 
Cyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsCyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and Solutions
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital Twin
 
Responsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksResponsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risks
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Cloud API Issues: an Empirical Study and Impact

  • 1. NICTA Copyright 2012 From imagination to impact Cloud API Issues: an Empirical Study and Impact Qinghua Lu, Liming Zhu, Len Bass, Xiwei Xu, Zhanwen Li, Hiroshi Wada Software Systems Research Group, NICTA QoSA13, Vancouver Slides at: http://www.slideshare.net/LimingZhu/
  • 2. NICTA Copyright 2012 From imagination to impact Motivation • Cloud applications fail due to operation issues – Gartner reports: 80% of outage caused by operations • People/Process: replication/failover, auto-scaling, upgrade… – Lessons from our own cloud DR product: Yuruware.com – DevOps movement • Operational causes of failures – Infrastructure and processes, but, – Most things are done through infrastructure API • Highly dependable cloud applications require – Architecting for not just the software but also its operation (thru API) – Architecting for indirect control (thru API) – Better understanding of Cloud API Issues • reliability, performance, nature of failures and faults 2
  • 3. NICTA Copyright 2012 From imagination to impact Main Contributions • Empirical study of cloud infrastructure API issues – 922 failure/fault cases from Amazon EC2 forums (2010 to 2012) • Around five most used API calls • Fault analysis supplemented by other sources – Classified the API failures and faults (causes of failures) • Using the classic dependable computing taxonomy (Avizienis,04) • Failures: content, late timing, halt, erratic • Faults: development, physical, interaction • Impact analysis through an initial proposal for tolerating cloud API failures/faults – Suggestions for tolerating content failures – 11 patterns for tolerating timing failures 3
  • 4. NICTA Copyright 2012 From imagination to impact Some Empirical Findings • Majority (60%) of the cases of API failures are related to stuck API calls or unresponsive API calls • 19% of the cases are related to the output issues of API calls – Error messages, missing/wrong/unexpected contents • 12% of the cases are about slow responsive API calls • 9% cases are related to API calls that – were pending for a certain time and then returned to the original state without informing the caller properly – were reported to be successful first but failed later 4
  • 5. NICTA Copyright 2012 From imagination to impact Methodology 5 • Amazon: EC2 forums and outage reports • Netflix: technical blogs and GitHub OSS projects • Yuruware.com: disaster recovery product which heavily relies on cloud infrastructure APIs
  • 6. NICTA Copyright 2012 From imagination to impact Data Collected from Amazon EC2 Forum 6 Searched keywords and number of returned records API of Interests Number of records from inception to 2012 Number of records from 2010 to 2012 describe instance 283 150 start instance 227 204 stop instance 349 348 detach volume 235 203 associate elastic IP 264 204 Total 1358 1109 Case type, number and percentage in the found cases Case type Case number Percentage of all cases from 2010-2012 API failures 922 83% Enquiries 125 11% API enhancements 62 6%
  • 7. NICTA Copyright 2012 From imagination to impact Classification of API Failures 7 Fault -> Error -> Failure Failure: deviation from correct service (external visible) Error: internal erroneous state Fault: adjudicated or hypothesized causes of a failure [13] A. Avizienis, J. C. Laprie, B. Randell, and C. Landwehr, "Basic concepts and taxonomy of dependable and secure computing," Dependable and Secure Computing, IEEE Transactions on, vol. 1, pp. 11-33, 2004.
  • 8. NICTA Copyright 2012 From imagination to impact Classification of API Failures • Content failures (19%) – With error messages; missing/wrong/unexpected content • 61% of the times users understood the causes/solutions from the error message • 39% of the times users could not pinpoint the causes from the error message 8 Posted on Jan 10, 2012 5:42 AM Symptom: When a user tried to start an instance, the operation failed with an unclear error message. Error message: State Transition Reason - Server.InternalError: Internal error on launch Root cause: Unknown. Solution: AWS engineers advised detaching the EBS volume from the instance and attaching it to another running instance. Posted on Jun 14, 2012 9:57PM Symptom: Failed API calls and receiving Request limit exceeded error message. Error message: Client.RequestLimitExceeded: Request limit exceeded Root cause: API calls exceeded limit. Solution: N/A. There is no official information on the limit or the time span on which the limit is calculated or suggested wait time. Failed call where the error message is unclear. Failed call where the error message is clear.
  • 9. NICTA Copyright 2012 From imagination to impact Classification of API Failures • Late timing failures (12%) – the arrival time of the delivered information deviates from the expected time but they do eventually arrive 9 A late timing failure example. Posted on Aug 27, 2012 11:57 AM Symptom: It took 16 minutes for an instance to stop. Root cause: n/a. Solution: The AWS engineer advised to try “force stop” twice if this happens next time.
  • 10. NICTA Copyright 2012 From imagination to impact Classification of API Failures • Halt failures (60%) – The external state becomes constant. – Most frequent failures! 10 A general halt failure example. Posted on Jun 27, 2012 12:04 AM Symptom: A user reported that the instance is stuck at stopping and “force stop” would not help. Root cause: n/a. Solution: The AWS engineer stopped the instance for the user on the AWS side (with some side effect). A silent failure example. Posted on Oct 23, 2012 7:45 AM Symptom: An instance was not accessible and the user could not stop/start it or create a snapshot Root cause: AWS outage. Solution: The AWS engineer advised that the user must launch a replacement instance from a pre- existing backup (EBS AMI). Attempts to stop an inaccessible instance will likely result in an instance becoming stuck in the stopping state. Customers that do not have a known good backup must wait for the issue to be resolved for their instance connectivity to be restored.
  • 11. NICTA Copyright 2012 From imagination to impact Classification of API Failures • Erratic failures (35%) – When the delivered service is unpredictable: Two subtypes: • the call is pending for a certain time and then returns to the original state • the call is successfully executed first but failed eventually 11 Two erratic failure examples. Posted on Feb 1, 2012 8:15 AM Symptom: A user associated an elastic IP with an instance and could SSH into the instance with the elastic IP. After a few minutes, the elastic IP was silently disassociated from the instance. Root cause: An issue with the underlying host. Solution: The AWS engineer advised that the quickest fix was to stop and then start the instance to relocate to a different host. Posted on Jan 14, 2011 1:43 PM Symptom: A user tried to start the instance several times. It indicated that the status is pending and it goes back to stop. Root cause: n/a. Solution: The AWS engineer returned the user’s EBS volume to the available state and believed this would resolve the user’s problem.
  • 12. NICTA Copyright 2012 From imagination to impact Classifying of Faults (Causes of Failures) • Development faults – software bugs – User workarounds exist but may break after bug fixing • Physical faults – Stopping/Starting to move to a new physical machine but problematic stopping – Future work: classifying using virtual resource characteristics • Interaction faults – Misconfiguration faults count for 30% • Accidental & purposeful misconfiguration – Purposeful misconfiguration • lack of knowledge (subjective uncertainty vs. stochastic uncertainty) • Configuration and operation impact on availability 1,2 1. X. Xu, Q. Lu, L. Zhu, et al., "Availability Analysis of In-Cloud Applications," in ISARCS13 (11:30 tomorrow) 2. Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application Deployment Decisions for Availability," in IEEE Cloud 2013 12
  • 13. NICTA Copyright 2012 From imagination to impact Tolerating API Failures/Faults 13 • Perspective – cloud consumer and application oriented – limited visibility: e.g. may not know the root cause – indirect control: e.g. solutions are thru APIs as well • Different failures/faults require different approaches – Failure/Fault classification dependent – Suggestions, patterns and ad-hoc use of failure/fault characteristics: • Content failure: alternative sources for content, defensive programming… • Late timing failures: API call life cycle driven
  • 14. NICTA Copyright 2012 From imagination to impact API Call Life Cycle Driven Patterns 14
  • 15. NICTA Copyright 2012 From imagination to impact Pattern Examples • Faster forced fail/complete – force-fail-r or force-fail-s • Netflix Hystrix: fail fast based on 95-99 percentile delay – force-complete-r • Yuruware: ignore some “describe” API calls • Hedged requests or more sophisticated retry – continue-request • Common: send the same request to 2 places and cancel the slow one – reallocate or reallocate-s • Yuruware: attach the to-be-moved volume to different mover instances after early mover failures 15
  • 16. NICTA Copyright 2012 From imagination to impact 16 Conclusion and Future Work • Empirical study of cloud infrastructure API issues – Analysed & classified 922 failure/faults from Amazon EC2 forums • Inform better architecting for operations (i.e. operator as a stakeholder) – Future work (completed) • Expanded to more cases from other sources (2087 issues) • Proposed a new scheme for classifying faults • Tolerating cloud API failures/faults – Patterns for tolerating different types of API failures/faults – Future work (ongoing) • More actionable mechanisms/patterns and their implementation • Use the characteristics of the faults and failures – for smarter recovery and error diagnosis during operation • What we need: more real world operation logs and collaborators {Liming.Zhu, Len.Bass}@nicta.com.au Slides available at http://www.slideshare.net/LimingZhu/