SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Starting your DevOps Journey
Practical Tips for Ops
http://dynatrace.com/trial
Brian Chandler
Systems Engineer @ Raymond James
@Channer531
Andreas Grabner
Chief DevOps Activist @ Dynatrace
@grabnerandi
Promise of DevOps: Faster & Efficient Innovation
Smaller Apps, Micro-Services More Deployments
App-, Service- & End-User Feedback Loops
Happy Users
Lower Costs
Proof: DevOps Adopters Are …
200x 2,555x
more frequent deployments faster lead times than their peers
More Agile
3x 24x
lower change failure rate faster Mean Time to Recover
More Reliable
More Successful 2x 50%
More likely to exceed market
expectations
Higher market cap growth
over 3 years
Source: Puppet Labs 2015 State Of DevOps Report: https://puppet.com/resources/white-paper/2016-state-of-devops-report
Dynatrace Transformation by the numbers
23x
170
More releases
Deployments / Day
31000 60h
Unit+Int Tests / hour UI Tests per Build
More Quality
~200 340
Code commits / day Stories per sprint
More Agile
93%
Production bugs found by Dev
More Stability 450 99.998%
Global EC2 Instances Global Availability
Webinar @ https://info.dynatrace.com/17q3_wc_from_agile_to_cloudy_devops_na_registration.html
YET: „DevOps Adoption is only 2%“ Gene Kim, Nov 2016
Interesting Ops Learnings from Adopters
New Technology Stack
New Architectural Patterns
End User Focused
New Deployment Models
DevOps Requirements and Engagement Options for Ops
Feedback through High Quality App & User Data
Ops as a Service: “Self-Service for Application Teams”
Bridge the Gap between Enterprise Stack and New Stack
Shift-Left: (No)Ops as “Part of Application Delivery”
RequirementsEngagementOptions
Basic App Monitoring1
App Dependencies2
End User Monitoring3
How to monitor mobile vs desktop vs tablet vs service endpoints?
How much network bandwidth is required per app, service and feature?
Where to start optimizing bandwidth: CDNs, Caching, Compression?
Are our applications up and running?
What load patterns do we have per application?
What is the resource consumption per application?
What are the dependencies between apps, services, DB and infra?
How to monitor „non custom app“ tiers?
Where are the dependency bottlenecks? Where is the weakest link?
Closing the Ops to Dev Feedback Loop: One Step at a Time!
“Soft-Launch” Support4
Virtualization Monitoring5 How to automatically monitor virtual and container instances?
What to monitor when deploying into public or private clouds?
How to deploy and monitor multiple versions of the same app / service?
What and how to baseline?
Do we have a better or worse version of an app/service/feature?
Ops: Need answers to these questions! Closing the gap to AppBizDev
Ready for “Cloud Native” How to alert on real problems and not architectural patterns?
How to consolidate monitoring between Cloud Native and Enterprise?
Who is using our apps? Geo? Device?
Which features are used? Whats the behavior?
Where to start optimizing? App Flow? Page Size?
Conversion Rates? Bounce Rates?
Where are the performance / resource hotspots?
When and where do applications break?
Do we have bad dependencies through code or config?
How does the system really behave in production?
What to learn for future architecturs?
What are the usage patterns for A/B or Green/Blue?
Difference between different versions and features?
Does the architecture work in these dynamic enviornments?
Does scale up/down work as expected?
Provide „Monitoring as a Service“ for Cloud Native
Application Teams6
Today
Questions to Answer!
Are our applications up & running?
What are the real load patterns?
What is the resource consumption?
Where to start optimizing?
Are our Apps Up, Running & Accessible?
Availability dropped to 0%
Early Warning SLA Monitoring!
Quality of
Connectivity, DNS
Quality of
Connectivity & DNS
Quality of Content
Delivery
Quality of Content
Delivery
3rd Party Impact
Delivery by Geo
Quality of Content
Delivery
Client Center Daily Traffic Pattern
Client Center sees a
peak of about 3,800
Req/min against the
it’s API.
Client Center Daily Traffic Pattern
Client Center sees a
peak of about 3,800
Req/min against the
it’s API.
60 unique
calls/functions that
make up the Client
Center API
Client Center Daily Traffic Pattern
~20% of that traffic is
ClientCenter/API/Holdings
Client Center Daily Traffic Pattern
~20% of that traffic is
ClientCenter/API/Holdings
~20% of that traffic is
ClientCenter/API/ClientDetails
Client Center Daily Traffic Pattern
~20% of that traffic is
ClientCenter/API/Holdings
~20% of that traffic is
ClientCenter/API/ClientDetails
~20% of that traffic is
ClientCenter/API/RecentSearch
Client Center Daily Traffic Pattern
Typical Peak Hour If you’re not careful, it could look like this…
Rhythmatic peaks and valleys suggest “lock-step” scripts (all virtual
users start and end at the same time.)
PRD usage is much more “fluid”. Steady stream
and balance across transaction usage
Total sum of traffic load was met. However, correct ratio of key transactions were not met.
Leveraging PRD data to tune QA Load Tests
Normal Production Distribution Failed Load Test Distribution
Black: Overall application load and peak volume Percentile breakdown of fast, warning, slow txs
VS.
Performance Differences Before and After Release
Occurrences of slow AccountList Transactions from load testingDistribution of “yellow” transactions for that time
AccountList makes
up most of these
transactions.
Normal distribution of
“expected” slow
transactions for this
API function.
Distribution generated
from load test. New
code would greatly
increase the
occurrences of slow
transactions in
production!
What is making up all that yellow?
Detection Load Distribution and Deployment Hotspots
Overall Load Distribution by SLA
Very Slow, Slow, Med, Fast
Tip: Logarithmic Y-Axis
Finding #3:
Server #3 only gets
load at certain times!
Finding #2a:
Server #1 was put back
in rotation HERE
Finding #2b:
Server #2 saw less
errors once #1 was up
Finding #1:
Response Time Spikes at
certain times not related
to load!
Validate Load Balancing
Tip: Load per Server!
Validate Load Balancing
Tip: Load per Server!
Validate Load Balancing
Tip: Load per Server!
Detection Load Distribution and Deployment Hotspots
Requests by App Server:
Tip: Percentage Bar Chart
Thread Usage:
Tip: Pool Size + Actual Use
Same for Web ServerSame for Web Server
Transfer Rate
Identify “heavy hitters”
Resource Utilization
Tip: CPU, Memory, I/O …
Detecting Resource Regression Hotspots
Time of Deployment
Other Resources: Bytes Transferred, Disk I/O, # of Log Messages, # of Open Connections, # of Calls …
Detecting Error Hotspots under Load
Automatic Hotspot Detection under Load
My Favorite: Layer Breakdown Chart
With increasing load: Which LAYER
doesn’t SCALE?
Automatic Availability Root Cause Detection
Web Performance Optimization
Automated 
List of root cause explanations for
SLA violations
Automatic Baselining per Business Transaction
Response Time Baselines based on
50th & 90th Percentile
Smart Alerting based on Significant
Measurement Violation
Direct link to Layer Breakdown and
Method Hotspot!
Automatic Anomaly and Root Cause Detection
Automatic Anomaly Detection Automatic Root Cause Information
Automatic Impact Details
Summary: Capabilities to Get Answers
Through Synthetic Monitoring: Are our applications up & running?
Availability, Response Time, CDN, Geo, …
Content Size and Content Validation
Through Endpoint Monitoring: What are the real load patterns?
Bucket by Response Time (Fast, Medium, Slow, Very Slow ...)
Bucket by Status Code (HTTP 2xx, 3xx, 4xx, 5xx, ...)
Through System Monitoring: What is the resource consumption?
CPU, Memory, Network and I/O
Through Basic Application Monitoring: Where to start optimizing?
Top Exceptions & Log Messages; # Thread (Idle, Busy)
Memory by Heap Space, Garbage Collection Activity
Execution Hotspots by Component
Which services do we actually host?
What is the health state of every component?
What are the dependencies?
What impacts the interconnected system health?
Questions to Answer!
Agent-Based Monitoring & Tracing:
Bridging Enterprise and New Stack
From Mobile
Via Middleware
To Mainframe
And Services
To SQL /
NoSQL
To SQL /
NoSQL
To SQL /
NoSQL
To External
Services
Analyzing Inter Tier Impact
#1: Load Spike
Direct correlation with # of
SQL queries -> OK!
#2: Same Load Spike
Direct correlation with # of
Exceptions -> OK!
#3: Starting with Load Spike
Time spent in JDBC (blue)
stays very high -> NOT OK!
#4: Problem Solved
Issue on Oracle Server
caused all SQL to be slow
Health State and Impact of Database!
DB-Related Blogs from Sonja: https://www.dynatrace.com/blog/author/sonja-chevre/
Proper Connection Pool Sizing!
Do we have enough DB
CONNECTIONS per pool?
Detecting Database Impact on Message Processing
#1: Cluster Failover Event
#2: System Struggled
but managed load
#2: System Struggled
but managed load
#3: DB Index Job with MAJOR
impact on End Users
@ Dynatrace: Service Tier Monitoring
#3: Queue Sizes
#1: Cassandra
Health
#2: Cassandra
Health
#1: Overall Tier Health
#4: Error States
What’s lurking under the water of the iceberg?
What is the cause of all performance problems?
40
Red wave of death appears on
dashboard.
Conference Bridge/Crisis Center
call with lots of “Smart Guy
Correlation”
Application recovers.
Triaging w/o anomaly detection on app dependencies
App1
Web
AppSvc
MB
EntSvc
DB
App2
Web
AppSvc
MB
EntSvc
DB DB
EntSvc
MB
App3
Web
AppSvc
App4
Web
AppSvc
MB
EntSvc
DB
App5
Web
AppSvc
MB
EntSvc
DB
41
DCRUM – True enterprise monitoring
App1
Web
AppSvc
MB
EntSvc
DB
App2
Web
AppSvc
MB
EntSvc
DB DB
EntSvc
MB
App3
Web
AppSvc
App4
Web
AppSvc
MB
EntSvc
DB
App5
Web
AppSvc
MB
EntSvc
DB
42
DCRUM – True enterprise monitoring
43
DCRUM – True enterprise monitoring
44
App1 App2 App5App4App3
Web Web Web
Svc1
WebWeb
DB1
EntSvc2
DB2
ENTSvc1
MB
Svc2 Svc4Svc3
DCRUM – True enterprise monitoring
45
DB1
EntSvc2
DB2
ENTSvc1
MB
Svc2 Svc4Svc3
DCRUM – True enterprise monitoring
46
DB1
EntSvc2
DB2
ENTSvc1
MB
Svc2 Svc4Svc3
DCRUM – True enterprise monitoring
Successful application dependency monitoring will allow you to
take a “bottom-up” approach to monitoring your enterprise.
“Bottom-up” Service View
Client Group 1, Servers A-D
Client Group 2, Servers E-H
Client Group 3, Servers I-L
Client Group 4, Servers M-Q
Client Group 5, Servers R-S
Different Apps and services
exercise enterprise services
and databases in varying ways!
Lack of load from these peers against this service
Poor performing node in this clientgroup
48
Link to the appropriate heat map
Alert sent based on deviation of calculated baseline
Baseline alerting granularity down the
operation level, not just the Software
Service
Delivering this data as actionable alerts
Usage and application behavior vary day-to-day.
A rolling average of services is not good enough
One week application usage trend
Monday Tuesday Wednesday Thursday Friday
The need for seasonal baselining
To achieve deeper statistical capabilities, we
use a combination of the PureLytics stream and
DCRUM REST interface to pour data into
analysis tools.
This allows us to reach back several weeks, on a single
minute for the given day (e.g. Monday at 10:03am
compared to the last 5 Mondays at 10:03am) to
calculate our baselines. For every unique operation in or
enterprise (25k+ recorded). That is a great deal of data!
Dynatrace performance metrics streaming
By reaching that far back at granular 1-minute intervals,
you can be very confident with the validity of your
baseline values
A 50ms-150ms deviation may not seem like a huge deal –
but in the world of app dependency monitoring, it truly is!
Graphical View of deep seasonal baselining
Service 1 needs to call Service 2 multiple
times. If service 2 slows down, it has an
enormous impact on all upstream services.
150ms shift in service 2
causes Service 1 to shift from
200ms-2s
Service 1
Service 2
Upstream impact of dependencies
Automatic Full Stack Monitoring
#1: All your Technologies #2: All Key Metrics
#3: Physical, Virtual, Containers or Cloud
Smartscape: Real Time Service-Oriented CMDB
#1: Understand WHO
talks with WHOM?
#2: Where are tiers
deployed?
#3: WHO might be
impacted by a failure?
Automatic Service Flow Tracing
#1: Understanding
Flow
#2: Dependencies
between Service
#3: Service
Clustering
Automatic Architectural Pattern Detection
#1: Action
initiated by the
SPA (Single Page
App)
#2: SPA was
making 3 AJAX
Calls in total!
#3: One of the calls
makes 13! Backend
REST Calls to
external system on
13 asynchronous
threads
Automatic Problem Pattern Detection
#1: Select Top
Common
Problem Patterns
#1: Explore
which
transactions
have this and
other problems
Automating Anomaly Detection
#1: All Root
Cause
Information
„encapsulated“
into a single
Problem
#2: “Time-Lapse”
of Problem
Evolution
#3: All relevant
Events: Infra,
Logging, App,
Service, End User
…
Automatic Integration with ChatOps
Summary: Capabilities to get answers
Through Automatic Dependency Detection
Which services hosted by which processes?
Where do these processes run?
Through Component Monitoring
Key metrics from Oracle, SQL, DB2, MySql, Postgres
Throughout on your Message Broker / Bus, Firewalls / Proxies
Through End-to-End Tracing
Which Services are depending for end-to-end use cases?
Where are our bottlenecks? How to optimize Deployment and archtiecture?
Through Anomaly Detection
Which tiers are acting out-of-the norm after an update or under certain load?
Who is impacted when one tier has an issue?
Where to look for the real root cause when a service goes down?
Promise of DevOps: Faster & Efficient Innovation
Smaller Apps, Micro-Services More Deployments
App-, Service- & End-User Feedback Loops
Happy Users
Lower Costs
Basic App Monitoring1
App Dependencies2
End User Monitoring3
How to monitor mobile vs desktop vs tablet vs service endpoints?
How much network bandwidth is required per app, service and feature?
Where to start optimizing bandwidth: CDNs, Caching, Compression?
Are our applications up and running?
What load patterns do we have per application?
What is the resource consumption per application?
What are the dependencies between apps, services, DB and infra?
How to monitor „non custom app“ tiers?
Where are the dependency bottlenecks? Where is the weakest link?
DevOps Monitoring Maturity: What we covered today?
“Soft-Launch” Support4
Virtualization Monitoring5 How to automatically monitor virtual and container instances?
What to monitor when deploying into public or private clouds?
How to deploy and monitor multiple versions of the same app / service?
What and how to baseline?
Do we have a better or worse version of an app/service/feature?
Ops: Need answers to these questions! Closing the gap to AppBizDev
Ready for “Cloud Native” How to alert on real problems and not architectural patterns?
How to consolidate monitoring between Cloud Native and Enterprise?
Who is using our apps? Geo? Device?
Which features are used? Whats the behavior?
Where to start optimizing? App Flow? Page Size?
Conversion Rates? Bounce Rates?
Where are the performance / resource hotspots?
When and where do applications break?
Do we have bad dependencies through code or config?
How does the system really behave in production?
What to learn for future architecturs?
What are the usage patterns for A/B or Green/Blue?
Difference between different versions and features?
Does the architecture work in these dynamic enviornments?
Does scale up/down work as expected?
Provide „Monitoring as a Service“ for Cloud Native
Application Teams6
We have the experience.
 One of the largest health care
insurance providers in the nation
– to DevOps in two weeks
 One of the largest furniture retailers in
the United States
– to DevOps in two weeks
We have a proven approach--
The DevOps Xcelerator
 Outline your digital performance
management (DPM) strategy
 Build on what you already have
 Implement DPM to support DevOps
 Validate your success
DPM Vision & Strategy
Discovery & Planning
Implementation
Validate Success
Identify DPM goals that guide your implementation
strategy in alignment with business objectives.
Ask the right questions. Collect the information.
Assemble required resources. Create your
implementation plan.
Follow the Dynatrace Expert Services (DXS)
implementation framework to successfully execute
your implementation plan.
Track, measure and report progress towards your
DPM goals so that your digital performance
investments add increasing value to the business.
66
Q & A Brian Chandler
Systems Engineer @ Raymond James
@Channer531
Andreas Grabner
Chief DevOps Activist @ Dynatrace
@grabnerandi
Action Items for you!
Try Dynatrace SaaS: http://bit.ly/dtsaastrial
Try Dynatrace AppMon On Premise: http://bit.ly/dtpersonal
List to our Podcast: http://bit.ly/pureperf
Read more on our blog: http://blog.dynatrace.com
Starting Your DevOps Journey – Practical Tips for Ops

Weitere ähnliche Inhalte

Was ist angesagt?

Sprinting for Success: Digital Transformation through Agile and DevOps
Sprinting for Success: Digital Transformation through Agile and DevOpsSprinting for Success: Digital Transformation through Agile and DevOps
Sprinting for Success: Digital Transformation through Agile and DevOps
Dynatrace
 

Was ist angesagt? (20)

Continuous Performance Testing and Monitoring in Agile Development
Continuous Performance Testing and Monitoring in Agile DevelopmentContinuous Performance Testing and Monitoring in Agile Development
Continuous Performance Testing and Monitoring in Agile Development
 
5 Steps for Identifying Deficiencies and Fixing Problems FAST
5 Steps for Identifying Deficiencies and Fixing Problems FAST5 Steps for Identifying Deficiencies and Fixing Problems FAST
5 Steps for Identifying Deficiencies and Fixing Problems FAST
 
Thinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiencesThinking about the full stack to create great mobile experiences
Thinking about the full stack to create great mobile experiences
 
Can We Deliver Mobile Apps Continuously?
Can We Deliver Mobile Apps Continuously?Can We Deliver Mobile Apps Continuously?
Can We Deliver Mobile Apps Continuously?
 
Webinar Evolving Monitoring & Customer Experience
Webinar Evolving Monitoring & Customer ExperienceWebinar Evolving Monitoring & Customer Experience
Webinar Evolving Monitoring & Customer Experience
 
DevOps 101 - Moving Fast with Confidence
DevOps 101 - Moving Fast with ConfidenceDevOps 101 - Moving Fast with Confidence
DevOps 101 - Moving Fast with Confidence
 
DevOps Transformation at Dynatrace and with Dynatrace
DevOps Transformation at Dynatrace and with DynatraceDevOps Transformation at Dynatrace and with Dynatrace
DevOps Transformation at Dynatrace and with Dynatrace
 
Fact2009 How To Operationalize Your Strategies
Fact2009 How To Operationalize Your StrategiesFact2009 How To Operationalize Your Strategies
Fact2009 How To Operationalize Your Strategies
 
Sprinting for Success: Digital Transformation through Agile and DevOps
Sprinting for Success: Digital Transformation through Agile and DevOpsSprinting for Success: Digital Transformation through Agile and DevOps
Sprinting for Success: Digital Transformation through Agile and DevOps
 
What's New with Dynatrace DC RUM - Release Highlights
What's New with Dynatrace DC RUM - Release HighlightsWhat's New with Dynatrace DC RUM - Release Highlights
What's New with Dynatrace DC RUM - Release Highlights
 
Draftkings: Launching w/ Confidence at Scale, FutureStack17 NYC
Draftkings: Launching w/ Confidence at Scale, FutureStack17 NYCDraftkings: Launching w/ Confidence at Scale, FutureStack17 NYC
Draftkings: Launching w/ Confidence at Scale, FutureStack17 NYC
 
Salesforce.com Continuous Integration
Salesforce.com Continuous IntegrationSalesforce.com Continuous Integration
Salesforce.com Continuous Integration
 
Leveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsLeveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional Tests
 
Five Universal Principles of API Design
Five Universal Principles of API DesignFive Universal Principles of API Design
Five Universal Principles of API Design
 
Operationalize all the network things
Operationalize all the network thingsOperationalize all the network things
Operationalize all the network things
 
New Amazing Things about AngularJS 2.0
New Amazing Things about AngularJS 2.0New Amazing Things about AngularJS 2.0
New Amazing Things about AngularJS 2.0
 
The Wix Way: Mastering Scale & Complexity to Deliver a Bug-less Product, Futu...
The Wix Way: Mastering Scale & Complexity to Deliver a Bug-less Product, Futu...The Wix Way: Mastering Scale & Complexity to Deliver a Bug-less Product, Futu...
The Wix Way: Mastering Scale & Complexity to Deliver a Bug-less Product, Futu...
 
Five Ways Automation Has Increased Application Deployment and Changed Culture
Five Ways Automation Has Increased Application Deployment and Changed CultureFive Ways Automation Has Increased Application Deployment and Changed Culture
Five Ways Automation Has Increased Application Deployment and Changed Culture
 
Metrics-Driven Devops: Delivering High Quality Software Faster!
Metrics-Driven Devops: Delivering High Quality Software Faster! Metrics-Driven Devops: Delivering High Quality Software Faster!
Metrics-Driven Devops: Delivering High Quality Software Faster!
 
Quality Jam 2017: Kevin Dunne "Macro Trends and Useful Tools that 'Get It'"
Quality Jam 2017: Kevin Dunne "Macro Trends and Useful Tools that 'Get It'"Quality Jam 2017: Kevin Dunne "Macro Trends and Useful Tools that 'Get It'"
Quality Jam 2017: Kevin Dunne "Macro Trends and Useful Tools that 'Get It'"
 

Andere mochten auch

A Tale of Two Pipelines: To DevOps or Not To DevOps
A Tale of Two Pipelines:  To DevOps or Not To DevOpsA Tale of Two Pipelines:  To DevOps or Not To DevOps
A Tale of Two Pipelines: To DevOps or Not To DevOps
Dynatrace
 
Critical online success factors with dynatrace
Critical online success factors with dynatraceCritical online success factors with dynatrace
Critical online success factors with dynatrace
DynatraceANZ
 

Andere mochten auch (8)

2016 Holiday Retail Tech Recap
2016 Holiday Retail Tech Recap2016 Holiday Retail Tech Recap
2016 Holiday Retail Tech Recap
 
Top Lessons Learned While Researching and Writing The DevOps Handbook
Top Lessons Learned While Researching and Writing The DevOps HandbookTop Lessons Learned While Researching and Writing The DevOps Handbook
Top Lessons Learned While Researching and Writing The DevOps Handbook
 
A Tale of Two Pipelines: To DevOps or Not To DevOps
A Tale of Two Pipelines:  To DevOps or Not To DevOpsA Tale of Two Pipelines:  To DevOps or Not To DevOps
A Tale of Two Pipelines: To DevOps or Not To DevOps
 
Dynatrace
DynatraceDynatrace
Dynatrace
 
Soluciones Dynatrace
Soluciones DynatraceSoluciones Dynatrace
Soluciones Dynatrace
 
DevOps: From Adoption to Performance
DevOps: From Adoption to PerformanceDevOps: From Adoption to Performance
DevOps: From Adoption to Performance
 
PerfUG : présentation de Dynatrace APM
PerfUG : présentation de Dynatrace APMPerfUG : présentation de Dynatrace APM
PerfUG : présentation de Dynatrace APM
 
Critical online success factors with dynatrace
Critical online success factors with dynatraceCritical online success factors with dynatrace
Critical online success factors with dynatrace
 

Ähnlich wie Starting Your DevOps Journey – Practical Tips for Ops

How to stop fingerpointing when your application is down
How to stop fingerpointing when your application is downHow to stop fingerpointing when your application is down
How to stop fingerpointing when your application is down
Compuware ASEAN
 
Connect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionConnect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API Protection
DevOps.com
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak
 
Magical Performance tuning with Gomez
Magical Performance tuning with GomezMagical Performance tuning with Gomez
Magical Performance tuning with Gomez
mcsaha
 

Ähnlich wie Starting Your DevOps Journey – Practical Tips for Ops (20)

Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningStart Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production Readiness
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineMetrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
 
How to stop fingerpointing when your application is down
How to stop fingerpointing when your application is downHow to stop fingerpointing when your application is down
How to stop fingerpointing when your application is down
 
Application-Servers.pdf
Application-Servers.pdfApplication-Servers.pdf
Application-Servers.pdf
 
Encontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de AplicacionesEncontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de Aplicaciones
 
Connect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API ProtectionConnect Ops and Security with Flexible Web App and API Protection
Connect Ops and Security with Flexible Web App and API Protection
 
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...
Atmosphere 2016 - Andreas Grabner - Metrics Driven-DevOps: Delivering High Qu...
 
T3 Consortium's Performance Center of Excellence
T3 Consortium's Performance Center of ExcellenceT3 Consortium's Performance Center of Excellence
T3 Consortium's Performance Center of Excellence
 
Operations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from HappeningOperations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from Happening
 
JavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep DiveJavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep Dive
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
 
Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Consul: Service-oriented at Scale
Consul: Service-oriented at ScaleConsul: Service-oriented at Scale
Consul: Service-oriented at Scale
 
Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Moving To SaaS
Moving To SaaSMoving To SaaS
Moving To SaaS
 
Magical Performance tuning with Gomez
Magical Performance tuning with GomezMagical Performance tuning with Gomez
Magical Performance tuning with Gomez
 

Mehr von Dynatrace

Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Dynatrace
 
SEI: Faster innovation and better performance for the innovative sei wealth p...
SEI: Faster innovation and better performance for the innovative sei wealth p...SEI: Faster innovation and better performance for the innovative sei wealth p...
SEI: Faster innovation and better performance for the innovative sei wealth p...
Dynatrace
 

Mehr von Dynatrace (20)

Virgin Money: Virgin Money's quest for digital performance perfection
Virgin Money: Virgin Money's quest for digital performance perfectionVirgin Money: Virgin Money's quest for digital performance perfection
Virgin Money: Virgin Money's quest for digital performance perfection
 
Dynatrace: The untouchables - the Dynatrace offering here and now
Dynatrace: The untouchables - the Dynatrace offering here and nowDynatrace: The untouchables - the Dynatrace offering here and now
Dynatrace: The untouchables - the Dynatrace offering here and now
 
Starbucks: Building a new dev culture and freeing time for innovation: A Star...
Starbucks: Building a new dev culture and freeing time for innovation: A Star...Starbucks: Building a new dev culture and freeing time for innovation: A Star...
Starbucks: Building a new dev culture and freeing time for innovation: A Star...
 
SITA: How smart apps are making air travel easier, every step of the way
SITA: How smart apps are making air travel easier, every step of the waySITA: How smart apps are making air travel easier, every step of the way
SITA: How smart apps are making air travel easier, every step of the way
 
Red Hat: Self driving IT is here, and it's real
Red Hat: Self driving IT is here, and it's realRed Hat: Self driving IT is here, and it's real
Red Hat: Self driving IT is here, and it's real
 
Paypal, Barbri: Lost in the cloud? Top challenges facing CIOs in a cloud nati...
Paypal, Barbri: Lost in the cloud? Top challenges facing CIOs in a cloud nati...Paypal, Barbri: Lost in the cloud? Top challenges facing CIOs in a cloud nati...
Paypal, Barbri: Lost in the cloud? Top challenges facing CIOs in a cloud nati...
 
Pivotal: Join us for a fireside chat with CEO of Pivotal
Pivotal: Join us for a fireside chat with CEO of PivotalPivotal: Join us for a fireside chat with CEO of Pivotal
Pivotal: Join us for a fireside chat with CEO of Pivotal
 
Harrods: Re-inventing the luxury retail market
Harrods: Re-inventing the luxury retail marketHarrods: Re-inventing the luxury retail market
Harrods: Re-inventing the luxury retail market
 
Dynatrace: Meet our captain of product and all things awesome, Steve Tack
Dynatrace: Meet our captain of product and all things awesome, Steve TackDynatrace: Meet our captain of product and all things awesome, Steve Tack
Dynatrace: Meet our captain of product and all things awesome, Steve Tack
 
Dynatrace: Accelerate your cloud innovation Welcome to Perform 2018
Dynatrace: Accelerate your cloud innovation Welcome to Perform 2018Dynatrace: Accelerate your cloud innovation Welcome to Perform 2018
Dynatrace: Accelerate your cloud innovation Welcome to Perform 2018
 
Dynatrace: Going beyond APM and soaring to the future
Dynatrace: Going beyond APM and soaring to the futureDynatrace: Going beyond APM and soaring to the future
Dynatrace: Going beyond APM and soaring to the future
 
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
 
Altimeter Group: The new face of change
Altimeter Group: The new face of changeAltimeter Group: The new face of change
Altimeter Group: The new face of change
 
Alastair Humphreys: Life stories and inspiration from Alastair Humphreys
Alastair Humphreys: Life stories and inspiration from Alastair HumphreysAlastair Humphreys: Life stories and inspiration from Alastair Humphreys
Alastair Humphreys: Life stories and inspiration from Alastair Humphreys
 
AWS: Serverless Architecture - Beyond functions and into the future
AWS: Serverless Architecture - Beyond functions and into the future AWS: Serverless Architecture - Beyond functions and into the future
AWS: Serverless Architecture - Beyond functions and into the future
 
Zurich: Monitoring a sales force-based insurance application using dynatrace ...
Zurich: Monitoring a sales force-based insurance application using dynatrace ...Zurich: Monitoring a sales force-based insurance application using dynatrace ...
Zurich: Monitoring a sales force-based insurance application using dynatrace ...
 
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
Sentry: Baselining, cloud-scale monitoring and auto-remediation with app mon ...
 
SEI: Faster innovation and better performance for the innovative sei wealth p...
SEI: Faster innovation and better performance for the innovative sei wealth p...SEI: Faster innovation and better performance for the innovative sei wealth p...
SEI: Faster innovation and better performance for the innovative sei wealth p...
 
SAP: How SAP fully automates the provisioning and operations of its dynatrace...
SAP: How SAP fully automates the provisioning and operations of its dynatrace...SAP: How SAP fully automates the provisioning and operations of its dynatrace...
SAP: How SAP fully automates the provisioning and operations of its dynatrace...
 
REI: Evolving performance engineering for the move to cloud, microservices, c...
REI: Evolving performance engineering for the move to cloud, microservices, c...REI: Evolving performance engineering for the move to cloud, microservices, c...
REI: Evolving performance engineering for the move to cloud, microservices, c...
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Starting Your DevOps Journey – Practical Tips for Ops

  • 1. Starting your DevOps Journey Practical Tips for Ops http://dynatrace.com/trial Brian Chandler Systems Engineer @ Raymond James @Channer531 Andreas Grabner Chief DevOps Activist @ Dynatrace @grabnerandi
  • 2. Promise of DevOps: Faster & Efficient Innovation Smaller Apps, Micro-Services More Deployments App-, Service- & End-User Feedback Loops Happy Users Lower Costs
  • 3. Proof: DevOps Adopters Are … 200x 2,555x more frequent deployments faster lead times than their peers More Agile 3x 24x lower change failure rate faster Mean Time to Recover More Reliable More Successful 2x 50% More likely to exceed market expectations Higher market cap growth over 3 years Source: Puppet Labs 2015 State Of DevOps Report: https://puppet.com/resources/white-paper/2016-state-of-devops-report
  • 4. Dynatrace Transformation by the numbers 23x 170 More releases Deployments / Day 31000 60h Unit+Int Tests / hour UI Tests per Build More Quality ~200 340 Code commits / day Stories per sprint More Agile 93% Production bugs found by Dev More Stability 450 99.998% Global EC2 Instances Global Availability Webinar @ https://info.dynatrace.com/17q3_wc_from_agile_to_cloudy_devops_na_registration.html
  • 5. YET: „DevOps Adoption is only 2%“ Gene Kim, Nov 2016
  • 6. Interesting Ops Learnings from Adopters New Technology Stack New Architectural Patterns End User Focused New Deployment Models
  • 7. DevOps Requirements and Engagement Options for Ops Feedback through High Quality App & User Data Ops as a Service: “Self-Service for Application Teams” Bridge the Gap between Enterprise Stack and New Stack Shift-Left: (No)Ops as “Part of Application Delivery” RequirementsEngagementOptions
  • 8. Basic App Monitoring1 App Dependencies2 End User Monitoring3 How to monitor mobile vs desktop vs tablet vs service endpoints? How much network bandwidth is required per app, service and feature? Where to start optimizing bandwidth: CDNs, Caching, Compression? Are our applications up and running? What load patterns do we have per application? What is the resource consumption per application? What are the dependencies between apps, services, DB and infra? How to monitor „non custom app“ tiers? Where are the dependency bottlenecks? Where is the weakest link? Closing the Ops to Dev Feedback Loop: One Step at a Time! “Soft-Launch” Support4 Virtualization Monitoring5 How to automatically monitor virtual and container instances? What to monitor when deploying into public or private clouds? How to deploy and monitor multiple versions of the same app / service? What and how to baseline? Do we have a better or worse version of an app/service/feature? Ops: Need answers to these questions! Closing the gap to AppBizDev Ready for “Cloud Native” How to alert on real problems and not architectural patterns? How to consolidate monitoring between Cloud Native and Enterprise? Who is using our apps? Geo? Device? Which features are used? Whats the behavior? Where to start optimizing? App Flow? Page Size? Conversion Rates? Bounce Rates? Where are the performance / resource hotspots? When and where do applications break? Do we have bad dependencies through code or config? How does the system really behave in production? What to learn for future architecturs? What are the usage patterns for A/B or Green/Blue? Difference between different versions and features? Does the architecture work in these dynamic enviornments? Does scale up/down work as expected? Provide „Monitoring as a Service“ for Cloud Native Application Teams6 Today
  • 9. Questions to Answer! Are our applications up & running? What are the real load patterns? What is the resource consumption? Where to start optimizing?
  • 10. Are our Apps Up, Running & Accessible? Availability dropped to 0%
  • 11. Early Warning SLA Monitoring! Quality of Connectivity, DNS Quality of Connectivity & DNS Quality of Content Delivery Quality of Content Delivery 3rd Party Impact Delivery by Geo Quality of Content Delivery
  • 12. Client Center Daily Traffic Pattern
  • 13. Client Center sees a peak of about 3,800 Req/min against the it’s API. Client Center Daily Traffic Pattern
  • 14. Client Center sees a peak of about 3,800 Req/min against the it’s API. 60 unique calls/functions that make up the Client Center API Client Center Daily Traffic Pattern
  • 15. ~20% of that traffic is ClientCenter/API/Holdings Client Center Daily Traffic Pattern
  • 16. ~20% of that traffic is ClientCenter/API/Holdings ~20% of that traffic is ClientCenter/API/ClientDetails Client Center Daily Traffic Pattern
  • 17. ~20% of that traffic is ClientCenter/API/Holdings ~20% of that traffic is ClientCenter/API/ClientDetails ~20% of that traffic is ClientCenter/API/RecentSearch Client Center Daily Traffic Pattern
  • 18. Typical Peak Hour If you’re not careful, it could look like this… Rhythmatic peaks and valleys suggest “lock-step” scripts (all virtual users start and end at the same time.) PRD usage is much more “fluid”. Steady stream and balance across transaction usage Total sum of traffic load was met. However, correct ratio of key transactions were not met. Leveraging PRD data to tune QA Load Tests
  • 19. Normal Production Distribution Failed Load Test Distribution Black: Overall application load and peak volume Percentile breakdown of fast, warning, slow txs VS. Performance Differences Before and After Release
  • 20. Occurrences of slow AccountList Transactions from load testingDistribution of “yellow” transactions for that time AccountList makes up most of these transactions. Normal distribution of “expected” slow transactions for this API function. Distribution generated from load test. New code would greatly increase the occurrences of slow transactions in production! What is making up all that yellow?
  • 21. Detection Load Distribution and Deployment Hotspots Overall Load Distribution by SLA Very Slow, Slow, Med, Fast Tip: Logarithmic Y-Axis Finding #3: Server #3 only gets load at certain times! Finding #2a: Server #1 was put back in rotation HERE Finding #2b: Server #2 saw less errors once #1 was up Finding #1: Response Time Spikes at certain times not related to load! Validate Load Balancing Tip: Load per Server! Validate Load Balancing Tip: Load per Server! Validate Load Balancing Tip: Load per Server!
  • 22. Detection Load Distribution and Deployment Hotspots Requests by App Server: Tip: Percentage Bar Chart Thread Usage: Tip: Pool Size + Actual Use Same for Web ServerSame for Web Server Transfer Rate Identify “heavy hitters” Resource Utilization Tip: CPU, Memory, I/O …
  • 23. Detecting Resource Regression Hotspots Time of Deployment Other Resources: Bytes Transferred, Disk I/O, # of Log Messages, # of Open Connections, # of Calls …
  • 25.
  • 26. Automatic Hotspot Detection under Load My Favorite: Layer Breakdown Chart With increasing load: Which LAYER doesn’t SCALE?
  • 27. Automatic Availability Root Cause Detection Web Performance Optimization Automated  List of root cause explanations for SLA violations
  • 28. Automatic Baselining per Business Transaction Response Time Baselines based on 50th & 90th Percentile Smart Alerting based on Significant Measurement Violation Direct link to Layer Breakdown and Method Hotspot!
  • 29. Automatic Anomaly and Root Cause Detection Automatic Anomaly Detection Automatic Root Cause Information Automatic Impact Details
  • 30. Summary: Capabilities to Get Answers Through Synthetic Monitoring: Are our applications up & running? Availability, Response Time, CDN, Geo, … Content Size and Content Validation Through Endpoint Monitoring: What are the real load patterns? Bucket by Response Time (Fast, Medium, Slow, Very Slow ...) Bucket by Status Code (HTTP 2xx, 3xx, 4xx, 5xx, ...) Through System Monitoring: What is the resource consumption? CPU, Memory, Network and I/O Through Basic Application Monitoring: Where to start optimizing? Top Exceptions & Log Messages; # Thread (Idle, Busy) Memory by Heap Space, Garbage Collection Activity Execution Hotspots by Component
  • 31. Which services do we actually host? What is the health state of every component? What are the dependencies? What impacts the interconnected system health? Questions to Answer!
  • 32. Agent-Based Monitoring & Tracing: Bridging Enterprise and New Stack From Mobile Via Middleware To Mainframe And Services To SQL / NoSQL To SQL / NoSQL To SQL / NoSQL To External Services
  • 33. Analyzing Inter Tier Impact #1: Load Spike Direct correlation with # of SQL queries -> OK! #2: Same Load Spike Direct correlation with # of Exceptions -> OK! #3: Starting with Load Spike Time spent in JDBC (blue) stays very high -> NOT OK! #4: Problem Solved Issue on Oracle Server caused all SQL to be slow
  • 34. Health State and Impact of Database! DB-Related Blogs from Sonja: https://www.dynatrace.com/blog/author/sonja-chevre/
  • 35. Proper Connection Pool Sizing! Do we have enough DB CONNECTIONS per pool?
  • 36. Detecting Database Impact on Message Processing #1: Cluster Failover Event #2: System Struggled but managed load #2: System Struggled but managed load #3: DB Index Job with MAJOR impact on End Users
  • 37. @ Dynatrace: Service Tier Monitoring #3: Queue Sizes #1: Cassandra Health #2: Cassandra Health #1: Overall Tier Health #4: Error States
  • 38. What’s lurking under the water of the iceberg?
  • 39. What is the cause of all performance problems?
  • 40. 40 Red wave of death appears on dashboard. Conference Bridge/Crisis Center call with lots of “Smart Guy Correlation” Application recovers. Triaging w/o anomaly detection on app dependencies
  • 43. 43 DCRUM – True enterprise monitoring
  • 44. 44 App1 App2 App5App4App3 Web Web Web Svc1 WebWeb DB1 EntSvc2 DB2 ENTSvc1 MB Svc2 Svc4Svc3 DCRUM – True enterprise monitoring
  • 46. 46 DB1 EntSvc2 DB2 ENTSvc1 MB Svc2 Svc4Svc3 DCRUM – True enterprise monitoring Successful application dependency monitoring will allow you to take a “bottom-up” approach to monitoring your enterprise.
  • 47. “Bottom-up” Service View Client Group 1, Servers A-D Client Group 2, Servers E-H Client Group 3, Servers I-L Client Group 4, Servers M-Q Client Group 5, Servers R-S Different Apps and services exercise enterprise services and databases in varying ways! Lack of load from these peers against this service Poor performing node in this clientgroup
  • 48. 48 Link to the appropriate heat map Alert sent based on deviation of calculated baseline Baseline alerting granularity down the operation level, not just the Software Service Delivering this data as actionable alerts
  • 49. Usage and application behavior vary day-to-day. A rolling average of services is not good enough One week application usage trend Monday Tuesday Wednesday Thursday Friday The need for seasonal baselining
  • 50. To achieve deeper statistical capabilities, we use a combination of the PureLytics stream and DCRUM REST interface to pour data into analysis tools. This allows us to reach back several weeks, on a single minute for the given day (e.g. Monday at 10:03am compared to the last 5 Mondays at 10:03am) to calculate our baselines. For every unique operation in or enterprise (25k+ recorded). That is a great deal of data! Dynatrace performance metrics streaming
  • 51. By reaching that far back at granular 1-minute intervals, you can be very confident with the validity of your baseline values A 50ms-150ms deviation may not seem like a huge deal – but in the world of app dependency monitoring, it truly is! Graphical View of deep seasonal baselining
  • 52. Service 1 needs to call Service 2 multiple times. If service 2 slows down, it has an enormous impact on all upstream services. 150ms shift in service 2 causes Service 1 to shift from 200ms-2s Service 1 Service 2 Upstream impact of dependencies
  • 53.
  • 54. Automatic Full Stack Monitoring #1: All your Technologies #2: All Key Metrics #3: Physical, Virtual, Containers or Cloud
  • 55. Smartscape: Real Time Service-Oriented CMDB #1: Understand WHO talks with WHOM? #2: Where are tiers deployed? #3: WHO might be impacted by a failure?
  • 56. Automatic Service Flow Tracing #1: Understanding Flow #2: Dependencies between Service #3: Service Clustering
  • 57. Automatic Architectural Pattern Detection #1: Action initiated by the SPA (Single Page App) #2: SPA was making 3 AJAX Calls in total! #3: One of the calls makes 13! Backend REST Calls to external system on 13 asynchronous threads
  • 58. Automatic Problem Pattern Detection #1: Select Top Common Problem Patterns #1: Explore which transactions have this and other problems
  • 59. Automating Anomaly Detection #1: All Root Cause Information „encapsulated“ into a single Problem #2: “Time-Lapse” of Problem Evolution #3: All relevant Events: Infra, Logging, App, Service, End User …
  • 61. Summary: Capabilities to get answers Through Automatic Dependency Detection Which services hosted by which processes? Where do these processes run? Through Component Monitoring Key metrics from Oracle, SQL, DB2, MySql, Postgres Throughout on your Message Broker / Bus, Firewalls / Proxies Through End-to-End Tracing Which Services are depending for end-to-end use cases? Where are our bottlenecks? How to optimize Deployment and archtiecture? Through Anomaly Detection Which tiers are acting out-of-the norm after an update or under certain load? Who is impacted when one tier has an issue? Where to look for the real root cause when a service goes down?
  • 62. Promise of DevOps: Faster & Efficient Innovation Smaller Apps, Micro-Services More Deployments App-, Service- & End-User Feedback Loops Happy Users Lower Costs
  • 63. Basic App Monitoring1 App Dependencies2 End User Monitoring3 How to monitor mobile vs desktop vs tablet vs service endpoints? How much network bandwidth is required per app, service and feature? Where to start optimizing bandwidth: CDNs, Caching, Compression? Are our applications up and running? What load patterns do we have per application? What is the resource consumption per application? What are the dependencies between apps, services, DB and infra? How to monitor „non custom app“ tiers? Where are the dependency bottlenecks? Where is the weakest link? DevOps Monitoring Maturity: What we covered today? “Soft-Launch” Support4 Virtualization Monitoring5 How to automatically monitor virtual and container instances? What to monitor when deploying into public or private clouds? How to deploy and monitor multiple versions of the same app / service? What and how to baseline? Do we have a better or worse version of an app/service/feature? Ops: Need answers to these questions! Closing the gap to AppBizDev Ready for “Cloud Native” How to alert on real problems and not architectural patterns? How to consolidate monitoring between Cloud Native and Enterprise? Who is using our apps? Geo? Device? Which features are used? Whats the behavior? Where to start optimizing? App Flow? Page Size? Conversion Rates? Bounce Rates? Where are the performance / resource hotspots? When and where do applications break? Do we have bad dependencies through code or config? How does the system really behave in production? What to learn for future architecturs? What are the usage patterns for A/B or Green/Blue? Difference between different versions and features? Does the architecture work in these dynamic enviornments? Does scale up/down work as expected? Provide „Monitoring as a Service“ for Cloud Native Application Teams6
  • 64. We have the experience.  One of the largest health care insurance providers in the nation – to DevOps in two weeks  One of the largest furniture retailers in the United States – to DevOps in two weeks
  • 65. We have a proven approach-- The DevOps Xcelerator  Outline your digital performance management (DPM) strategy  Build on what you already have  Implement DPM to support DevOps  Validate your success DPM Vision & Strategy Discovery & Planning Implementation Validate Success Identify DPM goals that guide your implementation strategy in alignment with business objectives. Ask the right questions. Collect the information. Assemble required resources. Create your implementation plan. Follow the Dynatrace Expert Services (DXS) implementation framework to successfully execute your implementation plan. Track, measure and report progress towards your DPM goals so that your digital performance investments add increasing value to the business.
  • 66. 66 Q & A Brian Chandler Systems Engineer @ Raymond James @Channer531 Andreas Grabner Chief DevOps Activist @ Dynatrace @grabnerandi Action Items for you! Try Dynatrace SaaS: http://bit.ly/dtsaastrial Try Dynatrace AppMon On Premise: http://bit.ly/dtpersonal List to our Podcast: http://bit.ly/pureperf Read more on our blog: http://blog.dynatrace.com

Hinweis der Redaktion

  1. Source: Puppet Labs 2016 State Of DevOps Report: https://puppet.com/resources/white-paper/2016-state-of-devops-report