JavaOne - Performance Focused DevOps to Improve Cont Delivery

•Als PPTX, PDF herunterladen•

2 gefällt mir•2,682 views

These are the slides of my JavaOne presentation. The abstract goes like this: How do companies developing business-critical Java enterprise Web applications increase releases from 40 to 300 per year and still remain confident about a spike of 1,800 percent in traffic during key events such as Super Bowl Sunday or Cyber Monday? It takes a fundamental change in culture. Although DevOps is often seen as a mechanism for taming the chaos, adopting an agile methodology across all teams is only the first step. This session explores best practices for continuous delivery with higher quality for improving collaboration between teams by consolidating tools and for reducing overhead to fix issues. It shows how to build a performance-focused culture with tools such as Hudson, Jenkins, Chef, Puppet, Selenium, and Compuware APM/dynaTrace

Technologie

11
Andreas Grabner
http://apmblog.compuware.com
@grabnerandi

5
Testing is Important – and gives Confidence

6
But are we ready for “The Real” world?

7
Measure Performance during the game
Ball Possession: 40 : 60
Fouls: 0 : 0
Score: 0 : 0
Minute 1 - 5

8
Measure Performance during the game
Minute 6 - 35
Ball Possession: 80 : 20
Fouls: 2 : 12
Score: 0 : 0

11
Not always a happy ending 
Minute 90
Ball Possession: 80 : 20
Fouls: 4 : 25
Score: 3 : 0

1414
From Deploy to …
Deploy Promotion/Event Problems Ops Playbook War Room
Timeline

1515
The “War Room” – back then
'Houston, we have a problem‘
NASA Mission Control Center, Apollo 13, 1970

1616
The “War Room” – NOW
Facebook – December 2012

1717
Problem: Unclear End User Problem Descriptions

1818
Statuc Quo: Ops Runbook – System Unresponsive

1919
Problem: Unclear Ops Problem Descriptions

2020
Status Quo: Ops Runbook – High Resource Usage

2424
What are the real questions?
Individual Users? ALL users?
Is it the APP? Or Delivery Chain?
Code problem? Infrastructure?
One transaction? ALL transactions?
In AppServer? In Virtual Machine?

2525
Problem: What Devs would like to have

2626
Problem: What Devs would like to have
Top Contributor is related to
String handling
99% of that time comes from
RegEx Pattern Matching
Page Rendering is the main component

2727
Its getting this …Its like getting this …

2929
RECAP Status Quo: We don’t like “War Rooms”

3030
Problem: Attitudes like this don’t help either
Image taken from https://www.scriptrock.com/blog/devops-whats-hype-about/
Shopzilla CIO (in 2010): “… when they get in the war room - the developers and ops teams
describe the problem as the enemy, not each other”

3131
Problem: Very “expensive” to work on these issues
~80% of problems
caused by ~20% patterns
YES we know this
80%Dev Time in Bug Fixing
$60BDefect Costs
BUT

3232
TOP PROBLEM
PATTERNS
• Focus on Web and Java

3333
Top Problem Patterns: Resource Pools

3434
Top Problem Patterns: Resource Pools

3535
Deployment Mistakes lead to internal Exceptions

3636
Deployment Mistakes lead to high logging overhead

3737
Production Deployment leads to Log SYNC Issues

3838
Long running SQL with Production Data

4040
Reading and processing too much data in App

4141
Memory Leaks in Cache Layer with Production Data
Still crashes
Problem fixed!Fixed Version Deployed

4242
Synchronization Issues under real load

4343
BLOATED Web Sites
17! JS Files – 1.7MB in Size
Useless Information!
Even might be a security risk!

4444
Missing or incorrect configured browser caches
62! Resources not cached
49! Resources with short expiration

4646
Want MORE of these and more details?
http://apmblog.compuware.com

4747
Lots of Problems that could have been avoided
• BUT WHY are they still making it to Production?

5050
Disconnected Teams despite “Shared Responsibility”

5252
How to make the Enterprise Crew happy?

5454
Solution: DevOps + Performance Focus
Culture
“Shared Responsibility”
Agile Process for ALL Teams
Performance as Key Requirement
X-Team Collaboration and Education
Automation
Measurement, Collaboration and Deployment
Automate Performance and
Architectural Problem Detection
Measurement
“Visible” KPIs for each Team
Focus on Performance, Architectural
and Deployment Measures
Sharing
Expertise, Tool and Data Sharing
“Easy” sharing of Performance, Deployment
and Production Data
http://www.opscode.com/blog/2010/07/16/what-devops-means-to-me/

5555
Culture: EXTEND Requirements with …
Performance Scalability
Testability
Deployability
Deployability

5656
Sharing: DON’T EXCLUDE anyone from Agile Process
Stand-Ups Sharing Tools
Feedback

5757
Measurement: Define KPIs accepted by all teams
# of SQL Executions
# of Log Lines
MBs / Uses
Time for Deployment
Time for Rollback
Response TimesPerf Test Code Coverage

5858
AUTOMATION, AUTOMATION, AUTOMATION
Performance Scalability
Shared Tools Automatic Feedback

5959
DevOps Collaboration – TODO LIST FOR YOU!!
Access to Production Data
Shared Reporting and Task Management
Diagnostic Tools
Shared Performance KPIs and Tooling
Known How Exchange

6060
Recap – Problem – Root Cause – Solution - Result
DevOps +
Performance Culture
Automation
Measurement
Collaboration

6262
Performance Focus in Test Automation
12 0 120ms
3 1 68ms
Build 20 testPurchase OK
testSearch OK
Build 17 testPurchase OK
testSearch OK
Build 18 testPurchase FAILED
testSearch OK
Build 19 testPurchase OK
testSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms
3 1 68ms
12 5 60ms
3 1 68ms
75 0 230ms
3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Lets look behind the
scenes
Exceptions probably reason
for failed tests
Problem fixed but now we have an
architectural regression
Problem fixed but now we have an
architectural regression
Now we have the functional and
architectural confidence

6363
Performance Focus in Test Automation
Analyzing All Unit / Performance Tests
Analyzing Metrics
such as DB Exec
Count
Jump in DB Calls
from one Build to the
next

6464
Performance Focus in Test Automation
Cross Impact of KPIs

6565
Performance Focus in Test Automation
Embed your Architectural Results
in Jenkins

6666
Performance Focus in Test Automation
Here is the difference!
Compare Build that shows BAD Behavior! With Build that shows GOOD Behavior!

6767
Performance Focus in Test Automation
CalculateUserStats is the
new Plugin that causes
problems

6868
Remember – DevOps requires Cultural Change
Share Integrate
Collaborate Performance

6969
Elevate our DevOps Investment - REDUCE
80% Dev Time for Bug Fixing
$60B Costs by Defects

70
© 2011 Compuware Corporation — All Rights Reserved© 2011 Compuware Corporation — All Rights Reserved
70
Participate in Compuware
APM Discussion Forums
apmcommunity.compuware.com
Like us on Facebook
facebook.com/CompuwareAPM
Join our LinkedIn group
Compuware APM User Group
Follow us on Twitter
twitter.com/CompuwareAPM
Read our Blog
About:Performance
Watch our Videos &
product Demos
youtube.com/Compuware
www.compuware.com/APM
Thank You

71
© 2011 Compuware Corporation — All Rights Reserved
Simply Smarter

Empfohlen

Load Testing using Continuous Integration toolsRick Pitts

BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!Andreas Grabner

Sydney Continuous Delivery Meetup May 2014Andreas Grabner

Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner

Mobile User Experience:Auto Drive through Performance MetricsAndreas Grabner

Continuous Deployment: The Dirty DetailsMike Brittain

From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamAndreas Grabner

Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Andreas Grabner

Empfohlen

Load Testing using Continuous Integration toolsRick Pitts

BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!Andreas Grabner

Sydney Continuous Delivery Meetup May 2014Andreas Grabner

Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner

Mobile User Experience:Auto Drive through Performance MetricsAndreas Grabner

Continuous Deployment: The Dirty DetailsMike Brittain

From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamAndreas Grabner

Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...Andreas Grabner

Web and App Performance: Top Problems to avoid to keep you out of the NewsAndreas Grabner

Distributed Release ManagementMike Brittain

Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015Andreas Grabner

Java Performance MistakesAndreas Grabner

HSPS 2015 - SharePoint Performance Santiy ChecksAndreas Grabner

Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...Andreas Grabner

Dan CuellarCodeFest

2012 - A Release OdysseyErnest Mueller

Tis The Season: Load Testing Tips and Checklist for Retail Seasonal ReadinessSOASTA

Using Automation to Meet Demands for Performance and QualityNeotys

Using JMeter in CloudTest for Continuous TestingSOASTA

Yelp Tech Talks: Mobile Testing 1, 2, 3Yelp Engineering

OWASP DefectDojo - Open Source Security SanityMatt Tesauro

Building a Secure DevOps Pipeline - for your AppSec Program Matt Tesauro

Load testing with Visual Studio and Azure - Andrew SiemerAndrew Siemer

Humans by the hundredYelp Engineering

Webinar: Load Testing for Your Peak SeasonSOASTA

Leandro Melendez - Switching Performance Left & RightNeotys_Partner

Leveraging Azure for Performance TestingTarun Arora

Windows Azure Acid Testexpanz

Client-side Performance TestingThoughtworks

2003 Winter NewsletterDirect Relief

Weitere ähnliche Inhalte

Was ist angesagt?

Web and App Performance: Top Problems to avoid to keep you out of the NewsAndreas Grabner

Distributed Release ManagementMike Brittain

Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015Andreas Grabner

Java Performance MistakesAndreas Grabner

HSPS 2015 - SharePoint Performance Santiy ChecksAndreas Grabner

Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...Andreas Grabner

Dan CuellarCodeFest

2012 - A Release OdysseyErnest Mueller

Tis The Season: Load Testing Tips and Checklist for Retail Seasonal ReadinessSOASTA

Using Automation to Meet Demands for Performance and QualityNeotys

Using JMeter in CloudTest for Continuous TestingSOASTA

Yelp Tech Talks: Mobile Testing 1, 2, 3Yelp Engineering

OWASP DefectDojo - Open Source Security SanityMatt Tesauro

Building a Secure DevOps Pipeline - for your AppSec Program Matt Tesauro

Load testing with Visual Studio and Azure - Andrew SiemerAndrew Siemer

Humans by the hundredYelp Engineering

Webinar: Load Testing for Your Peak SeasonSOASTA

Leandro Melendez - Switching Performance Left & RightNeotys_Partner

Leveraging Azure for Performance TestingTarun Arora

Windows Azure Acid Testexpanz

Was ist angesagt? (20)

Web and App Performance: Top Problems to avoid to keep you out of the News

Distributed Release Management

Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015

Java Performance Mistakes

HSPS 2015 - SharePoint Performance Santiy Checks

Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...

Dan Cuellar

2012 - A Release Odyssey

Tis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness

Using Automation to Meet Demands for Performance and Quality

Using JMeter in CloudTest for Continuous Testing

Yelp Tech Talks: Mobile Testing 1, 2, 3

OWASP DefectDojo - Open Source Security Sanity

Building a Secure DevOps Pipeline - for your AppSec Program

Load testing with Visual Studio and Azure - Andrew Siemer

Humans by the hundred

Webinar: Load Testing for Your Peak Season

Leandro Melendez - Switching Performance Left & Right

Leveraging Azure for Performance Testing

Windows Azure Acid Test

Andere mochten auch

Client-side Performance TestingThoughtworks

2003 Winter NewsletterDirect Relief

Tsahim 1mongoo_8301

CleverBear презентацияTrofimov Mikhail

KEPERCAYAAN GURU Ñûrãzwã Šãlěh

retrobsd-2012-JUL-07 at JNUG BSD BoFたけおかしょうぞう

гарчиггүй 1mongoo_8301

Диагностика внутренней средыNatali Starginskay

Hum2310 0900 research project assignment, mla citation guide & mla citation f...ProfWillAdams

Krishi Mitrneedtoshare

Interrupt jhcたけおかしょうぞう

Exploring Cloud Credentials for Institutional UseJeremy Rosenberg

Hum2220 sm2015 proust questionnaireProfWillAdams

Update on Institutional Identity Management Priorities at SFUJeremy Rosenberg

Hugs instead of Bugs: Dreaming of Quality Tools for Devs and TestersAndreas Grabner

Β' ΤΑΞΗ ΥΛΗ ΕΞΕΤΑΣΕΩΝ 2016Katerina Arabatzi

Java Tech & Tools | Grails in the Java Enterprise | Peter LedbrookJAX London

Olivo, pianta dalle innumerevoli qualitàCreAgri Europe

2010 Winter NewsletterDirect Relief

Direct Relief Newsletter Winter 2011Direct Relief

Andere mochten auch (20)

Client-side Performance Testing

2003 Winter Newsletter

Tsahim 1

CleverBear презентация

KEPERCAYAAN GURU

retrobsd-2012-JUL-07 at JNUG BSD BoF

гарчиггүй 1

Диагностика внутренней среды

Hum2310 0900 research project assignment, mla citation guide & mla citation f...

Krishi Mitr

Interrupt jhc

Exploring Cloud Credentials for Institutional Use

Hum2220 sm2015 proust questionnaire

Update on Institutional Identity Management Priorities at SFU

Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers

Β' ΤΑΞΗ ΥΛΗ ΕΞΕΤΑΣΕΩΝ 2016

Java Tech & Tools | Grails in the Java Enterprise | Peter Ledbrook

Olivo, pianta dalle innumerevoli qualità

2010 Winter Newsletter

Direct Relief Newsletter Winter 2011

Ähnlich wie JavaOne - Performance Focused DevOps to Improve Cont Delivery

StarWest 2013 Performance is not an afterthought – make it a part of your Agi...Andreas Grabner

Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...Andreas Grabner

Works on my machine, your problem now? - QCon 2014Wolfgang Gottesheim

London web perfug_performancefocused_devops_feb2014Andreas Grabner

DevOps: Find Solutions, Not More DefectsTechWell

DevOps and Performance - Why, How and Best Practices - DevOps Meetup SydneyAndreas Grabner

Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...John McCaffrey

Windy cityrails performance_tuningJohn McCaffrey

How to Better Manage Technical Debt While Innovating on DevOpsDynatrace

Database story by DevOpsAnton Martynenko

Rsqrd AI: How to Design a Reliable and Reproducible PipelineSanjana Chowdhury

The Changing Landscape of SharePointPhil Greer

Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner

The challenges and pitfalls of database deployment automationDBmaestro - Database DevOps

SysAdmin to SRE: Solving the Last Mile ProblemRundeck

PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom

STP 2014 - Lets Learn from the Top Performance Mistakes in 2013Andreas Grabner

Chaos Engineering and How to Manage Data Stages With Adi Polak | Current 2022HostedbyConfluent

Top-10-Java-Performance-Problems.pdfKiranChinnagangannag

OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdfAltinity Ltd

Ähnlich wie JavaOne - Performance Focused DevOps to Improve Cont Delivery (20)

StarWest 2013 Performance is not an afterthought – make it a part of your Agi...

Performance Quality Metrics for Mobile Web and Mobile Native - Agile Testing ...

Works on my machine, your problem now? - QCon 2014

London web perfug_performancefocused_devops_feb2014

DevOps: Find Solutions, Not More Defects

DevOps and Performance - Why, How and Best Practices - DevOps Meetup Sydney

Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...

Windy cityrails performance_tuning

How to Better Manage Technical Debt While Innovating on DevOps

Database story by DevOps

Rsqrd AI: How to Design a Reliable and Reproducible Pipeline

The Changing Landscape of SharePoint

Top Java Performance Problems and Metrics To Check in Your Pipeline

The challenges and pitfalls of database deployment automation

SysAdmin to SRE: Solving the Last Mile Problem

PyData 2015 Keynote: "A Systems View of Machine Learning"

STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Chaos Engineering and How to Manage Data Stages With Adi Polak | Current 2022

Top-10-Java-Performance-Problems.pdf

OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf

Mehr von Andreas Grabner

KCD Munich - Cloud Native Platform Dilemma - Turning it into an OpportunityAndreas Grabner

OpenTelemetry For GitOps: Tracing Deployments from Git Commit to ProductionAndreas Grabner

Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps DeploymentsAndreas Grabner

Observability and Orchestration of your GitOps Deployments with KeptnAndreas Grabner

Release Readiness Validation with Keptn for Austrian Online Banking SoftwareAndreas Grabner

Adding Security to your SLO-based Release Validation with KeptnAndreas Grabner

A Guide to Event-Driven SRE-inspired DevOpsAndreas Grabner

Jenkins Online Meetup - Automated SLI based Build Validation with KeptnAndreas Grabner

Continuous Delivery and Automated Operations on k8s with keptnAndreas Grabner

Keptn - Automated Operations & Continuous Delivery for k8sAndreas Grabner

Shipping Code like a keptn: Continuous Delivery & Automated Operations on k8sAndreas Grabner

Top Performance Problems in Distributed ArchitecturesAndreas Grabner

Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-HealingAndreas Grabner

Monitoring as a Self-Service in Atlassian DevOps ToolchainAndreas Grabner

How to explain DevOps to your momAndreas Grabner

DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code DeploysAndreas Grabner

AWS Summit - Trends in Advanced Monitoring for AWS environmentsAndreas Grabner

DevOps Transformation at Dynatrace and with DynatraceAndreas Grabner

DevOps Pipelines and Metrics Driven Feedback LoopsAndreas Grabner

Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and HowAndreas Grabner

Mehr von Andreas Grabner (20)

KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity

OpenTelemetry For GitOps: Tracing Deployments from Git Commit to Production

Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments

Observability and Orchestration of your GitOps Deployments with Keptn

Release Readiness Validation with Keptn for Austrian Online Banking Software

Adding Security to your SLO-based Release Validation with Keptn

A Guide to Event-Driven SRE-inspired DevOps

Jenkins Online Meetup - Automated SLI based Build Validation with Keptn

Continuous Delivery and Automated Operations on k8s with keptn

Keptn - Automated Operations & Continuous Delivery for k8s

Shipping Code like a keptn: Continuous Delivery & Automated Operations on k8s

Top Performance Problems in Distributed Architectures

Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing

Monitoring as a Self-Service in Atlassian DevOps Toolchain

How to explain DevOps to your mom

DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys

AWS Summit - Trends in Advanced Monitoring for AWS environments

DevOps Transformation at Dynatrace and with Dynatrace

DevOps Pipelines and Metrics Driven Feedback Loops

Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How

Kürzlich hochgeladen

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers

A Journey Into the Emotions of Software DevelopersNicole Novielli

2024 April Patch TuesdayIvanti

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica

Data governance with Unity Catalog PresentationKnoldus Inc.

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

Kürzlich hochgeladen (20)

The Ultimate Guide to Choosing WordPress Pros and Cons

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Zeshan Sattar- Assessing the skill requirements and industry expectations for...

Testing tools and AI - ideas what to try with some tool examples

Design pattern talk by Kaya Weers - 2024 (v2)

A Journey Into the Emotions of Software Developers

2024 April Patch Tuesday

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Time Series Foundation Models - current state and future directions

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Generative AI - Gitex v1Generative AI - Gitex v1.pptx

React Native vs Ionic - The Best Mobile App Framework

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

Data governance with Unity Catalog Presentation

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

JavaOne - Performance Focused DevOps to Improve Cont Delivery

1. 11 Andreas Grabner http://apmblog.compuware.com @grabnerandi

2. 2

3. 3

4. 4

5. 5 Testing is Important – and gives Confidence

6. 6 But are we ready for “The Real” world?

7. 7 Measure Performance during the game Ball Possession: 40 : 60 Fouls: 0 : 0 Score: 0 : 0 Minute 1 - 5

8. 8 Measure Performance during the game Minute 6 - 35 Ball Possession: 80 : 20 Fouls: 2 : 12 Score: 0 : 0

9. 9 Deep Dive Analysis

10. 10 Options “To Fix” the situation

11. 11 Not always a happy ending  Minute 90 Ball Possession: 80 : 20 Fouls: 4 : 25 Score: 3 : 0

12. 12 FRUSTRATED FANS!! 12

13. 13 How does that relate to Software?

14. 1414 From Deploy to … Deploy Promotion/Event Problems Ops Playbook War Room Timeline

15. 1515 The “War Room” – back then 'Houston, we have a problem‘ NASA Mission Control Center, Apollo 13, 1970

16. 1616 The “War Room” – NOW Facebook – December 2012

17. 1717 Problem: Unclear End User Problem Descriptions

18. 1818 Statuc Quo: Ops Runbook – System Unresponsive

19. 1919 Problem: Unclear Ops Problem Descriptions

20. 2020 Status Quo: Ops Runbook – High Resource Usage

21. 2121 Lack of data?

22. 2222

23. 23 Answers to the right questions

24. 2424 What are the real questions? Individual Users? ALL users? Is it the APP? Or Delivery Chain? Code problem? Infrastructure? One transaction? ALL transactions? In AppServer? In Virtual Machine?

25. 2525 Problem: What Devs would like to have

26. 2626 Problem: What Devs would like to have Top Contributor is related to String handling 99% of that time comes from RegEx Pattern Matching Page Rendering is the main component

27. 2727 Its getting this …Its like getting this …

28. 28 … when you need to see this!

29. 2929 RECAP Status Quo: We don’t like “War Rooms”

30. 3030 Problem: Attitudes like this don’t help either Image taken from https://www.scriptrock.com/blog/devops-whats-hype-about/ Shopzilla CIO (in 2010): “… when they get in the war room - the developers and ops teams describe the problem as the enemy, not each other”

31. 3131 Problem: Very “expensive” to work on these issues ~80% of problems caused by ~20% patterns YES we know this 80%Dev Time in Bug Fixing $60BDefect Costs BUT

32. 3232 TOP PROBLEM PATTERNS • Focus on Web and Java

33. 3333 Top Problem Patterns: Resource Pools

34. 3434 Top Problem Patterns: Resource Pools

35. 3535 Deployment Mistakes lead to internal Exceptions

36. 3636 Deployment Mistakes lead to high logging overhead

37. 3737 Production Deployment leads to Log SYNC Issues

38. 3838 Long running SQL with Production Data

39. 3939 N+1 Query Problem

40. 4040 Reading and processing too much data in App

41. 4141 Memory Leaks in Cache Layer with Production Data Still crashes Problem fixed!Fixed Version Deployed

42. 4242 Synchronization Issues under real load

43. 4343 BLOATED Web Sites 17! JS Files – 1.7MB in Size Useless Information! Even might be a security risk!

44. 4444 Missing or incorrect configured browser caches 62! Resources not cached 49! Resources with short expiration

45. 4545 SLOW or Failing 3rd Party Content

46. 4646 Want MORE of these and more details? http://apmblog.compuware.com

47. 4747 Lots of Problems that could have been avoided • BUT WHY are they still making it to Production?

48. 4848 Missing Focus on Performance

49. 4949 Different Goals for Dev and Ops

50. 5050 Disconnected Teams despite “Shared Responsibility”

51. 5151

52. 5252 How to make the Enterprise Crew happy?

53. 5353

54. 5454 Solution: DevOps + Performance Focus Culture “Shared Responsibility” Agile Process for ALL Teams Performance as Key Requirement X-Team Collaboration and Education Automation Measurement, Collaboration and Deployment Automate Performance and Architectural Problem Detection Measurement “Visible” KPIs for each Team Focus on Performance, Architectural and Deployment Measures Sharing Expertise, Tool and Data Sharing “Easy” sharing of Performance, Deployment and Production Data http://www.opscode.com/blog/2010/07/16/what-devops-means-to-me/

55. 5555 Culture: EXTEND Requirements with … Performance Scalability Testability Deployability Deployability

56. 5656 Sharing: DON’T EXCLUDE anyone from Agile Process Stand-Ups Sharing Tools Feedback

57. 5757 Measurement: Define KPIs accepted by all teams # of SQL Executions # of Log Lines MBs / Uses Time for Deployment Time for Rollback Response TimesPerf Test Code Coverage

58. 5858 AUTOMATION, AUTOMATION, AUTOMATION Performance Scalability Shared Tools Automatic Feedback

59. 5959 DevOps Collaboration – TODO LIST FOR YOU!! Access to Production Data Shared Reporting and Task Management Diagnostic Tools Shared Performance KPIs and Tooling Known How Exchange

60. 6060 Recap – Problem – Root Cause – Solution - Result DevOps + Performance Culture Automation Measurement Collaboration

61. 6161 TIPS FOR DEVS

62. 6262 Performance Focus in Test Automation 12 0 120ms 3 1 68ms Build 20 testPurchase OK testSearch OK Build 17 testPurchase OK testSearch OK Build 18 testPurchase FAILED testSearch OK Build 19 testPurchase OK testSearch OK Build # Test Case Status # SQL # Excep CPU 12 0 120ms 3 1 68ms 12 5 60ms 3 1 68ms 75 0 230ms 3 1 68ms Test Framework Results Architectural Data We identified a regresesion Problem solved Lets look behind the scenes Exceptions probably reason for failed tests Problem fixed but now we have an architectural regression Problem fixed but now we have an architectural regression Now we have the functional and architectural confidence

63. 6363 Performance Focus in Test Automation Analyzing All Unit / Performance Tests Analyzing Metrics such as DB Exec Count Jump in DB Calls from one Build to the next

64. 6464 Performance Focus in Test Automation Cross Impact of KPIs

65. 6565 Performance Focus in Test Automation Embed your Architectural Results in Jenkins

66. 6666 Performance Focus in Test Automation Here is the difference! Compare Build that shows BAD Behavior! With Build that shows GOOD Behavior!

67. 6767 Performance Focus in Test Automation CalculateUserStats is the new Plugin that causes problems

68. 6868 Remember – DevOps requires Cultural Change Share Integrate Collaborate Performance

69. 6969 Elevate our DevOps Investment - REDUCE 80% Dev Time for Bug Fixing $60B Costs by Defects

70. 70 © 2011 Compuware Corporation — All Rights Reserved© 2011 Compuware Corporation — All Rights Reserved 70 Participate in Compuware APM Discussion Forums apmcommunity.compuware.com Like us on Facebook facebook.com/CompuwareAPM Join our LinkedIn group Compuware APM User Group Follow us on Twitter twitter.com/CompuwareAPM Read our Blog About:Performance Watch our Videos & product Demos youtube.com/Compuware www.compuware.com/APM Thank You

Hinweis der Redaktion

AbstractHow do companies developing business-critical Java enterprise Web applications increase releases from 40 to 300 per year and still remain confident about a spike of 1,800 percent in traffic during key events such as Super Bowl Sunday or Cyber Monday? It takes a fundamental change in culture. Although DevOps is often seen as a mechanism for taming the chaos, adopting an agile methodology across all teams is only the first step. This session explores best practices for continuous delivery with higher quality for improving collaboration between teams by consolidating tools and for reducing overhead to fix issues. It shows how to build a performance-focused culture with tools such as Hudson, Jenkins, Chef, Puppet, Selenium, and Compuware APM/dynaTrace
Who knows what that is?It’s the Fifa World Cup Trophy
Teams are currently competing in the qualifications to compete in Brazil 2014
This is “my” austrian national team soccer team. Their GOAL is to qualify for Brazil 2014. After the many failed attempts in the past we hired a new coach who’s goal is to form a new team that PERFORMs good enough to qualify
In order to get there the team competed in many test games. Which gaves them a lot of confidence because they played against teams that were “easier” to beat. At the end of these tests we even started in the qualification with some wins against teams that we were expecting to winSo – at the end of these “test and easy qualification games” we thought: “ALL GOOD – THE ROAD IS OPEN FOR 2014 – NOT ONLY WILL WE QUALIFY BUT WE ALSO BELIEVE WE HAVE SUCH A STRONG TEAM THAT WILL ALSO DO WELL AT THE WORLDCUP”
Then reality kicked in when we had our first “real competitor” – it was the first qualification against a team whos quality level is at a level that we have to expect at the world cup.The competing team was Germany – and – based on these images you can see how the game went
The coach is responsible to watch the game and see how things are going. Like in other sports – soccer has a couple of Key Performance Indicators such as Ball Possession, Fouls and the actual scoreThe first 5 minutes actually didn’t look too bad
After the first 5 minutes the game changes – with germany taking over the game in their typical way. The KPIs make this very clearThe coach is responsible to react based on these values and how the game wents
The coach should use more data for detailed analysis on what is going wrong in the game
One of his options is to substitute players – or even change tacticsDoes this succeed based on the KPIs that we have seen before?
Well – not always. Just replacing players – putting some in that are faster in chasing the ball doesn’t always help
StoryNew Build Deployed on Thursday Evening Everything runs smooth on Friday DaytimeAn Ad Campaign hits the Air Friday NightThe site crashes under load -> ALERTS GO OFFRestarting Server -> SERVER DOESN’T STARTAdding more Servers-> PROBLEM REMAINSCalling in the “App Experts” and Pizza Delivery!
They getOps’ problem description: “App Server crashed”, “Out of file handles”Users’ problem description: “It is slow”, “It crashed”
They GetHigh CPU, Memory or Bandwidth IssuesLog files: GB’s of logfiles with 99.9% “useless” information
There is lots of data – but – does a high CPU Utilization really mean that this machine has a problem and need to be restarted?What could be the problem if your user experience tool tells you that people have bad response times?But what do we do with all this disconnected data?
They needApplication data: Executed Transactions, Load, CPU, Memory, Disk usage,...Impacted transactions with context information: User Actions, Call stack, Thread Overview, Method Parameters, SQL Calls, Invoked Service CallsInvolved Application Components: Web Server, App Servers, DatabaseImpact of service calls: Performance, Availability, Response CodesError Details: HTTP Errors, Exceptions, warning/severe log messages
30%: What we hear from talking to people is that a lot of problems that happen in production happen to times that are not very “developer friendly” -> RUN THROUGH STORY60%: Restarting a crashed application server or adding an additional server to handle the load often doesn’t solve the problem either -> That’s when its time to call in the Application Experts (Developers or Testers)100%: Devs (and probably anybody else as well) are not happy to get called at 2AM to look at a problem. They also know that its not going to be an easy fix because there is probably not enough data available to fix this – so its going to be a lot of trial and errors with a Team (Ops) that is reluctant for Trial and Error.More talking points:The Challenge with Outside Business Hours problemsRestarts are not the silver bulletApplication Experts to fix problems unlikely to be available at 2AMThis leads to “CritSit”, “War Room”, … including Dev, Test, Ops …The Challenge with Production Problem AnalysisOps often doesn’t know what information is required by Dev & TestOps typically doesn’t want to give Devs access to machines for triage Leads to Tension between Dev, Test and OpsINTERESTING FACT: 80/20 Rule20% of Problem Patterns responsible for 80% of ProblemsMost problems could have been found early on PREVENTION is POSSIBLEBecause RESTARTING Applications IS NOTthe solutionBest Case: You are just “hiding” a problemWorst Case: App doesn’t start anymoreBecause ROOT CAUSEis often NOT FOUNDin log filesWhich log files to look at? App Server, Web Server, OS Event Log, …?Even Splunk can’t help if there is not sufficient informationBecause CHANGING APPbehavior CANNOTalways be done through config filesYou can’t turn off a Memory Leak via a switchTrial & Error Changes, e.g: Increasing pool sizes will just “shift” the problemBecause They (DEV, TEST, ARCHITECTS) are the APPLICATION EXPERTSThey know WHERE to look and WHAT to look forThey can fix the code and advise on other deployment options
Well – I guess there is just not more to say about this. The attitude between these teams doesn’t help in solving issues any faster
We all know this statistic in one form or another – so – it is clear that these problems that are handled in War Rooms are VERY EXPENSIVEBUTWhat is interesting is that these problems are typically not detected earlier because the focus of engineering is on building new features instead of focusing on performance and scalable architecture.What’s interesting though is that many of these problems could easily be found earlier on – LETS have a look at these common problems that we constantly run into …
Depending on the audience you want to show or hide some of the following slides
Resource Pool ExhaustionMisconfiguration or failed deployment, e.g: default config from devActual resource leak -> can be identified with Unit/Integration Tests
Resource Pool Exhaustion (same as before – just different Pool)Using the same deployment tools in Test and Ops can prevent thisTesting with real load can detect that
Deployment Issues leading to heavy logging resulting in high I/O and CPUUsing the same deployment tools in Test and Ops can prevent thisAnalyzing Log Output per Component in Dev prevents this problem
Deployment Issues leading to heavy logging resulting in high I/O and CPUUsing the same deployment tools in Test and Ops can prevent thisAnalyzing Log Output per Component in Dev prevents this problem
Too many and too slow Database QueriesDev and Test need to have “production-like” database – Updates on a “Sample Databases” won’t show slow updatesAccess Patterns such as N+1 can be identified with Unit Tests
Too many and too slow Database QueriesDev and Test need to have “production-like” database – Updates on a “Sample Databases” won’t show slow updatesAccess Patterns such as N+1 can be identified with Unit Tests
Too much data requested from DatabaseDev and Test need to have “production-like” database – Otherwise these problem patterns can only be found in prodEducate Developers on “the power of SQL” – instead of loading everything in memory and performing filters/aggregations/… in the App
Memory Leaks: Too much data in CacheCan be found in test with “production-like” data sets and tests that do not only test the same “search” query -> get feedback from ProdEducate developers on memory and cache strategies
Synching issues caused by deadlocksCan be found with small scale performance unit tests by developersEducate developers on synchronization/multi-threading strategies
Not following WPO (Web Performance Optimization Rules)Non optimized content, e.g: compression, merging, …Educate developers and automate WPO checks
Not leveraging Browser-side CachingMisconfigured CDNs or missing cache settings -> automate cache configuration deploymentEducate developers; Educate testers to do “real life” testing (CDN, …)
Slow or failing 3rd party contentImpacts page load time; Ops is required to monitor 3rd party servicesEducatedevs to optimize loading; Educate test to include 3rd party testing
Why this is a problem?Biz pushes features. In order to deliver more features in a more agile way development adopted agile development methodologies to deliver more releases with more features in a shorter timeframeTo save costs we outsource. Some companies also organically grew by acquisition leaving us with dev teams that are distributed across the globeTo be faster we use 3rd Party Code as we do not want to re-invent the wheel. However – not every 3rd party component or service is really fit for the requirements we have in our production enviornment. It may work well on the workstation for a single user – but often fails in a larger environment3rd Party Services or ContentAverage US Sports Website loads content from 29! domains3rd Party Components in Application CodeHibernate, Spring, .NET Enterprise Blocks …GWT, ExtJS, jQueryAmazon Web Services, Google API, …
Feature – richness vs. NO CHANGE
Not well communicated what change is ahead. No “Integration” of Ops Teams in Agile Process
A big step is to tear down these walls between these teams.
CAMS is taken from OpsCode (Creators of Chef) Blog: http://www.opscode.com/blog/2010/07/16/what-devops-means-to-me/ Culture People and process first. If you don’t have culture, all automation attempts will be fruitless.Automation This is one of the places you start once you understand your culture. At this point, the tools can start to stitch together an automation fabric for Devops. Tools for release management, provisioning, configuration management, systems integration, monitoring and control, and orchestration become important pieces in building a Devops fabric.Measurement If you can’t measure, you can’t improve. A successful Devops implementation will measure everything it can as often as it can… performance metrics, process metrics, and even people metrics.SharingSharing is the loopback in the CAMS cycle. Creating a culture where people share ideas and problems is critical. Jody Mulkey, the CIO at Shopzilla, told me that they get in the war room the developers and operations teams describe the problem as the enemy, not each other. Another interesting motivation in the Devops movement is the way sharing Devops success stories helps others. First, it attracts talent, and second, there is a belief that by exposing ideas you can create a great open feedback that in the end helps them improveThe change that is required is already well understood in the DevOps movement that’s been going on for years – BUT – it is important to add Performance as Key Requirement to Culture, Automation, Measurement and Sharing. Culture: PERFORMANCE is a key requirement for everything that is done throughout the delivery chain. We have heard that a lot of the problems that lead to a War Room scenario are problems that could be found earlier if there would be a focus on Performance and Quality throughout the organizationAutomation: Automation is Key for DevOps and Agile Development. What needs to change is that performance and architectural problems are automatically detected in the development and delivery process. This can be achieved by focusing automated testing for exactly these problems – whether it is in C/I or in the “traditional” test areaMeasurement: We can only measure success if we have Key Performance Indicators for each team, e.g: Test Coverage %, Number of Tests Executed, Throughput, Response Time, Number of Deployments, … - an additional focus must be on measures that allow us to track performance and architectural issues. This allows us to identify and prevent any performance regressions as soon as they get introducedSharing:
Agile Development (Stories & Tasks) excludesPerformance and Scalability Requirements from TestTestability Requirements from TestDeployment and Stability Requirements from OpsRequirements: are currently mainly brought in by the business side who demand more features. What is missing are the requirements from Test and Ops.
Agile Process excludes Test and OpsNot part of Standups, Reviews, Planning'sNo active sharing of data, requirements, feedbackNo common toolset/platform/metrics that makes sharing easyCollaboration: Test & Ops are not part of the agile process. There is no active involvement in the standups, reviews or planning meetings. The lack of common tools and a different understanding of quality, metrics and requirements also make it hard to share dataSharing ToolsThe different teams currently use their own set of tools that help them in their day-to-day work in their “local” environment.Developers focus on development tools to help them with developing code, debugging and analyzing the basic problems.Testers use their load testing tools and combine them with some system monitoring tools to e.g: capture CPU, Memory, Network UtilizationOps uses their tools to analyze network traffic, host health, log analyzers, …When these teams need to collaborate in order to identify the root cause of a problem they typically speak a different language. Developers are used to debuggers, thread and memory dumps. But what they get is things like “the system is slow with that many Virtual Users on the system where Host CPU starts showing a problem”.When there is a production problem both developers and testers are typically not satisfied with network statistics or operating system event logs that don’t tell them what really went on in the application. Test wont be able to reproduce the problem with that information nor will devs be able to debug through their code based on that informationIn order to make life easier for developers to troubleshoot the issue they would like to install their tools in test and ops – but these tools are typically not fit for high load and production enviornments. Debuggers have too much overhead, they require restarts and changes to the system -> Ops doesn’t like change!!
Some examples on KPIsNumber of SQL Statements executed -> tells Ops on what to expect in production-> tells architects on whether to optimize this with a cache or a different DB Access StrategyNumber of Log Lines-> tells Ops how to optimize storage for Logging-> tells architects whether there is LOG SPAM happeningMemory Consumption per User Session-> tells Ops how to scale their production environment-> tells architects whether you are “wasteful” with heap spaceTime for a single DeploymentTime for rollbacks
Automation: C/I currently only executes tests that cover functionality, e.g: unit and maybe integration or some functional tests (Selenium, …). What is missing is the concept of already executing small scale performance and scalability tests that would allow us to automatically detect those problem patterns discussed earlier. With that we could already eliminate the need for MOST War Room situations.
So – to sum up – here are some action items (ToDo List)1a: Share and Develop Tools that are used across team boundaries, e.g: Add more diagnostics tools to test or share deployment tools developed by test with Ops1b: a critical component is testing early and testing in real life environments. Test Teams need to empower developers by giving them access to their performance test frameworks that Devs can use to also test performance and architectural KPIs early on. Ops needs to work with Test and provide access to production or staging so that Test can perform their large scale load and performance test in a realistic enviornment2a: It is important to establish a shared reporting and task management system. Very often we see companies that share Wiki Instances and Task Tracking systems where they have status report pages from all teams as well as tracking issues that are found in dev, test and production2b: It is also important to share the same toolset across all tiers so that Dev, Test and Ops get the data they need, that can easily be shared and is understood by everybody
When we are recapping the initial problem that we described and the root causes for it we have to say we have a good solution to solve these problems.DevOps is the way to go – BUT – it requires a big focus on Performance, Architecture, Scalability and Deployment.It requires more Automation to find these problems early onIt requires more Measurement as measures allow us to identify these deficiency throughout the Agile processIt requires active sharing of the data which will bring the teams even closer together so that they are working on a “SHARED GOAL”Following all of this will result in 100% confidence when rolling out a production release – without the need of a war room
When we look at the results of your Testing Framework from Build over Build we can easily spot functional regressions. In our example we see that testPurchase fails in Build 18. We notify the developer, problem gets fixed and with Build 19 we are back to functional correctness. Looking behind the scenesThe problem is that Functional Testing only verifies the functionality to the caller of the tested function. Using dynaTrace we are able to analyze the internals of the tested code. We analyze metrics such as Number of Executed SQL Statements, Number of Exceptions thrown, Time spent on CPU, Memory Consumption, Number of Remoting Calls, Transfered Bytes, …In Build 18 we can see a nice correlation of Exceptions to the failed functional test. We can assume that one of these exceptions caused the problem. For a developer it would be very helpful to get exception information which helps to quickly identify the root cause of the problem and solve it faster.In Build 19 the Testing Framework indicates ALL GREEN. When we look behind the scenes we see that we have a big jump in SQL Statements as well as CPU Usage. What just happened? The Developer fixed the functional problem but introduced an architectural regression. This needs to be looked into – otherwise this change will have negative impact on the application once tested under loadIn Build 20 all these problems are fixed. We are still meeting our functional goals and are back to acceptable number of SQL Statements, Exceptions, CPU Usage, …
Web Architectural Metrics# of JS Files, # of CSS, # of redirectsSize of Images