More Related Content
Similar to Continuous Performance Monitoring of a Distributed Application [CON4730] (20)
Continuous Performance Monitoring of a Distributed Application [CON4730]
- 2. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.2
Insert Picture Here
Continuous Performance
Monitoring of a Distributed
Application
Ashish Srivastava
Principal Member of Technical Staff
Diana Yuryeva
Senior Member of Technical Staff
- 3. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.3
The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract.
It is not a commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decisions. The development,
release, and timing of any features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
- 4. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4
Session Goal
§ Components:
– Design patterns
– Tools
§ Qualities:
– Continuous
– Light-weight
– Recordable
Arrive at solution for extreme performance monitoring
- 5. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5
Session Agenda
§ Use Case
§ Software Patterns
§ Tools
§ Pitfalls and Advice
§ Q&A
- 6. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6
About Us
§ Oracle Billing and Revenue Management Elastic Charging Engine
– 100% real-time charging application
– Java
– Distributed grid
– Oracle Coherence
– Oracle NoSQL
– Focus on extreme performance
- 7. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.7
Operating Conditions
§ Low latency expectations
§ Heavy system load
§ Distributed environment
§ Multi-level software stack
- 8. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8
Monitoring Requirements
§ Detailed insight about performance
– Latency
– Throughput
§ View over time
§ Reporting
§ Bottleneck detection
§ View of system as cohesive unit
Functional
- 9. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9
Monitoring Requirements
§ Minimal impact on processing
§ Ease of use
§ Separation of concerns
Non-Functional
- 10. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10
Session Agenda
§ Use Case
§ Software Patterns
§ Tools
§ Pitfalls and Advice
§ Q&A
- 11. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11
Approach
§ Off-the-shelf software not sufficient
§ Custom development needed
– Incorporate monitoring into system
– Collect, analyze and present metrics
How do I address these requirements?
- 12. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.12
Collecting Metrics
§ Goal
– Incorporate metrics collection into general processing
§ Approach
– Enhance domain model with monitoring-related data structures
Problem overview
- 13. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13
Collecting Metrics
Model of sample system
ECE Client
Network Mediation
A
B
B'
C
C'
A'
A
Node1
Node3
Node2
request
―Debatch the requests
―Data lookups
―Apply Tariff
―Save Session
―Prepare Response
―..
response
- 14. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14
Collecting Metrics
Solution
ECE Client
Network Mediation
A
B
B'
C
C'
A'
A
Node1
Node3
Node2
request
―Debatch the requests
―Data lookups
―Apply Tariff
―Save Session
―Prepare Response
―..
response
- 15. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15
Client Node
Processing
Node
Envelope
Routing Context
Payload
Tracking Context Chronicler
– TimePoints
Stat Reporterharvest
Envelope
- 16. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16
Collecting Metrics
##### Elapsed time = 3600 seconds
##### Avg throughput = 20000 ops/sec
##### Avg latency = 50 ms
Result
- 17. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17
Granular Reporting
§ Goal
– I need more granular reporting of performance over time
§ Approach
– Enhance reporting of collected metrics
Problem overview
- 18. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18
Granular Reporting
Solution – data structure
Chronicler removed
―
A moving reporting window
―
100% reporting
―
Sampled reporting
―
Stats exposed over JMX
―
Fixed data set for a window
Chronicler added
- 19. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.19
Granular Reporting
Solution – class diagram
- 20. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20
Granular Reporting
§ I can see min/max/avg latency and throughput over time
§ My throughput reporting is quite good: I can see whether I had
stable or erratic throughput
Result
- 21. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21
Latency Percentile Report
§ Goal
– Latencies are still not detailed enough. I need to know more than the
average/min/max latencies
– Need to guarantee that 99.999% of the requests take less than 55ms
§ Approach
– Introduce range bucketing to count latencies
Problem overview
- 22. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22
Latency Percentile Report
§ Pre-defined buckets of latency percentiles
§ Data set does not grow. Each bucket is updated
§ Multiple percentile breakdown
– End-to-end
– Server side processing
– Per batch reporting
Solution
- 23. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23
Latency Percentile Report
2013-03-21 08:44:29.112 PDT INFO ##### Latency statistics based on percentiles:
Percentile: 0.1, Latency: 1ms, Total Count: 1173148
Percentile: 1.0, Latency: 2ms, Total Count: 100909763
Percentile: 10.0, Latency: 2ms, Total Count: 100909763
Percentile: 95.0, Latency: 26ms, Total Count: 685176664
Percentile: 99.0, Latency: 50ms, Total Count: 713029967
Percentile: 99.5, Latency: 58ms, Total Count: 716355711
Percentile: 99.9, Latency: 78ms, Total Count: 719217619
Percentile: 99.99, Latency: 104ms, Total Count: 719836971
Percentile: 99.999, Latency: 128ms, Total Count: 719897850
Percentile: 100.0, Latency: 169ms, Total Count: 719904814
Result – printed report
- 24. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24
Latency Percentile Report
Result – heat map
- 25. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25
Method Breakdown
§ Goal
– I want to measure the impact of a new method under varying load
– End-to-end latency always ON
– Minimum performance impact
§ Approach
– Method annotations
– Aspect
Problem overview
- 26. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26
Method Breakdown
Solution
public enum LabelEnum {
APPLY_TARIFF,
...
DEBATCH
}
public class ClassToBeTracked {
@Track(pointLabel = LabelEnum.APPLY_TARIFF)
private <ReturnObject> method(<Parameters>) {
...
}
}
<pointcut name="scope" expression="within(ClassName) "/>
- 27. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27
Method Breakdown
Result – detailed breakdown report
2013-07-15 16:29:24.953 PDT
Chronicler Breakdown:
DEBATCH -> 64149 nanoseconds
LOOKUP_DATA -> 1056748 nanoseconds
APPLY_TARIFF -> 99994 nanoseconds
SAVE_SESSION -> 12989 nanoseconds
PREPARE_RESPONSE -> 15998 nanoseconds
- 28. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28
Session Agenda
§ Use Case
§ Software Patterns
§ Tools
§ Pitfalls and Advice
§ Q&A
- 29. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29
Storing and Presenting Metrics
§ Goal
– I collect detailed performance metrics, but I need to report them too
– I need a tool which stores these metrics and presents them in a unified
view
§ Approach
– Create monitoring dashboard
– Technologies: JRDS,RRD and in-house development
Problem overview
- 30. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.30
Storing and Presenting Metrics
Result – monitoring dashboard
Configuration:
Topology 24 servers
Throughput 20000 ops/sec
Duration 10 hrs
- 31. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31
Storing and Presenting Metrics
§ Graphical
§ Supports various metrics
– Application-specific
– Machine-specific
– JVM-specific
§ Consolidated view
– All graphs on one page
Solution qualities
- 32. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32
Storing and Presenting Metrics
§ Easy to use
– Collects and saves data automatically
§ Easy to share
– Includes configuration for future references
– Send links to web pages
– Print page as PDF
Solution qualities
- 33. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33
Storing and Presenting Metrics
§ Stores data without losing precision
§ Supports drilling down
§ Light-weight
§ Customizable
Solution qualities
- 34. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34
Session Agenda
§ Use Case
§ Software Patterns
§ Tools
§ Pitfalls and Advice
§ Q&A
- 35. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35
Pitfalls and Advice
§ Distributed system monitoring != Single JVM monitoring
– Consolidated view is critical
§ Consistency of tools across team is important
– Same language across development, QE and Performance teams saves
hours
§ Solution should enable you to be agile
– Run monitoring on laptop AND realistic setup
Take into consideration
- 36. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36
Pitfalls and Advice
§ These hide problems
– Averaging
– Sampling
§ GC has big impact, so include it in your metrics
§ Watch our for processes sharing the same host
§ Always run long-duration tests
Some things to pay attention to
- 37. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37
Session Summary
Detailed insight about performance
– Latency
– Throughput
View over time
Reporting
Bottleneck detection
View of system as cohesive unit
Let's see how we addressed original requirements
- 38. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38
Session Summary
Detailed insight about performance
– Latency
– Throughput
View over time
Reporting
Bottleneck detection
View of system as cohesive unit
Let's see how we addressed original requirements
- 39. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39
Session Summary
Detailed insight about performance
– Latency
– Throughput
View over time
Reporting
Bottleneck detection
View of system as cohesive unit
Let's see how we addressed original requirements
- 40. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.40
Session Summary
Detailed insight about performance
– Latency
– Throughput
View over time
Reporting
Bottleneck detection
View of system as cohesive unit
Let's see how we addressed original requirements
- 41. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.41
Session Summary
Detailed insight about performance
– Latency
– Throughput
View over time
Reporting
Bottleneck detection
View of system as cohesive unit
Let's see how we addressed original requirements
- 42. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.42
Session Summary
Detailed insight about performance
– Latency
– Throughput
View over time
Reporting
Bottleneck detection
View of system as cohesive unit
Let's see how we addressed original requirements
- 43. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.43
Session Agenda
§ Use Case
§ Software Patterns
§ Tools
§ Pitfalls and Advice
§ Q&A
- 45. Copyright © 2012, Oracle and/or its affiliates. All rights reserved.45
Links
§ ECE
– http://www.oracle.com/us/products/applications/communications/elastic-
charging-engine
§ JRDS
– http://www.jrds.fr