SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Understanding Latency and
Response Time Behavior
Pitfalls, Lessons and Tools
Matt Schuetze
Director of Product Management
Azul Systems
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency Behavior
Latency: The time it took one operation to happen
Each operation occurrence has its own latency
So we need to measure again, and again, and again...
What we care about is how latency behaves
Behavior is more than “what was the common case?”
©2014 Azul Systems, Inc.
What do you care about?
Do you :
Care about latency in your system?
Care about the worst case?
Care about the 99.99%?
Only care about the fastest thing in the day?
Only care about the best 50%
Only need 90% of operations to meet needs?
Care if “only” 1% of operations are painfully slow?
Care if 90% of users see outliers every hour?
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency “wishful thinking”
We know how to compute
averages & std. deviation, etc.
Wouldn’t it be nice if latency had a
normal distribution?
The average, 90%, 99%, std.
deviation, etc. can give us a “feel”
for the rest of the distribution,
right?
If 99% of the stuff behaves well,
how bad can the rest be, really?
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
99% better than 0.5 msec
99.99% better than 5 msec
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: latency distribution
Mean = 0.06 msec
Std. Deviation (𝞂) = 0.21msec
99.999% = 38.66msec
~184 σ (!!!) away from the mean
In a normal distribution,
the 99.999%’ile falls within 4.5 σ
These are NOT normal distributions
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The real world: “outliers”
99%‘ile is ~60 usec
Max is ~30,000% higher
than “typical”
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
A classic look at response time
behavior
Response time as a function of load
source: IBM CICS server documentation, “understanding response times”
Average?
Max?
Median?
90%?
99.9%
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Response time over time
When we measure behavior over time, we often see:
source: ZOHO QEngine White Paper: performance testing report analysis
“Hiccups”
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Hiccups are [typically]
strongly multi-modal
They don’t look anything like a normal distribution
They usually look like periodic freezes
A complete shift from one mode/behavior to another
Mode A: “good”.
Mode B: “Somewhat bad”
Mode C: “terrible”, ...
....
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Common ways people deal with hiccups
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Common ways people deal with hiccups
Averages and Standard Deviation
Always
Wrong!
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Better ways people can deal with hiccups
Actually characterizing latency
Requirements
Response Time
Percentile plot
line
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Requirements
Why we measure latency and response times
to begin with...
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency:
Stating Requirements
Requirements describe how latency should behave
Useful Latency requirements are usually stated as a
PASS/FAIL test against some predefined criteria
Different applications have different needs
Requirements should reflect application needs
Measurements should provide data to evaluate requirements
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Establishing Requirements
an interactive interview (or thought) process
Q: What are your latency requirements?
A: We need an avg. response of 20 msec
Q: Ok. Typical/average of 20 msec... So what is the worst case requirement?
A: We don’t have one
Q: So it’s ok for some things to take more than 5 hours?
A: No way in H%%&!
Q: So I’ll write down “5 hours worst case...”
A: No. That’s not what I said. Make that “nothing worse than 100 msec”
Q: Are you sure? Even if it’s only two times a day?
A: Ok... Make it “nothing worse than 2 seconds...”
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Establishing Requirements
an interactive interview (or thought) process
Ok. So we need a typical of 20msec, and a worst case of 2 seconds. How often is it ok to
have a 1 second response?
A: (Annoyed) I thought you said only a few times a day
Q: That was for the worst case. But if half the results are better than 20 msec, is it ok for the
other half to be just short of 2 seconds? What % of the time are you willing to take a 1
second, or a half second hiccup? Or some other level?
A: Oh. Let’s see. We have to better than 50 msec 90% of the time, or we’ll be losing money
even when we are fast the rest of the time. We need to be better than 500 msec 99.9% of
the time, or our customers will complain and go elsewhere
Now we have a service level expectation:
50% better than 20 msec
90% better than 50 msec
99.9% better than 500 msec
100% better than 2 seconds
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency does not live in a vacuum
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Remember this?
How much load can this system handle?
Where the
sysadmin is
willing to go
What the
marketing
benchmarks
will say
Where users
complain
Sustainable
Throughput
Level
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Comparing behavior under different throughputs
and/or configurations
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Latency behavior under different throughputs, configurations
latency sensitive messaging distribution application
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
The coordinated omission problem
An accidental conspiracy...
©2014 Azul Systems, Inc.
Coordinated Omission (“CO”) is the measurement error which
is introduced by naively recording requests, sorting them and
reporting the result as the percentile distribution of the request
latency.
Any recording method, synchronous or asynchronous, which
results in even partially coordinating sample times with the
system under test, and as a result avoids recording some of
the originally intended samples will exhibit CO. Synchronous
methods tend to naturally exhibit this sort of nativity when
intended sample time are not kept far apart from each other.
Synopsis of Coordinated Omission
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Real World Coordinated Omission effects
Uncorrected Data
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Real World Coordinated Omission effects
Uncorrected Data
Corrected for
Coordinated
Omission
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Real World Coordinated Omission effects
(Why I care)
A ~2500x
difference in
reported
percentile levels
for the problem
that Zing
eliminates
Zing
“other” JVM
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Suggestions
Whatever your measurement technique is, TEST IT.
Run your measurement method against artificial systems
that create hypothetical pauses scenarios. See if your
reported results agree with how you would describe that
system behavior
Don’t waste time analyzing until you establish sanity
Don’t EVER use or derive from std. deviation
ALWAYS measure Max time. Consider what it means... Be
suspicious.
Measure %‘iles. Lots of them.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram
If you want to be able to produce graphs like this...
You need both good dynamic range
and good resolution
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram background
Goal: Collect data for good latency characterization...
Including acceptable precision at and between varying percentile levels
Existing alternatives
Record all data, analyze later (e.g. sort and get 99.9%‘ile).
Record in traditional histograms
Traditional Histograms: Linear bins, Logarithmic bins, or
Arbitrary bins
Linear requires lots of storage to cover range with good resolution
Logarithmic covers wide range but has terrible precisions
Arbitrary is.... arbitrary. Works only when you have a good feel for the
interesting parts of the value range
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
HdrHistogram
A High Dynamic Range Histogram
Covers a configurable dynamic value range
At configurable precision (expressed as number of significant digits)
For Example:
Track values between 1 microsecond and 1 hour
With 3 decimal points of resolution
Built-in [optional] compensation for Coordinated
Omission
Open Source
On github, released to the public domain, creative commons CC0
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Plotting HdrHistogram output
Convenient for plotting and comparing test results
X-Axis
Y-Axis
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
jHiccup
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
jHiccup
A tool for capturing and displaying platform hiccups
Records any observed non-continuity of the underlying platform
Plots results in simple, consistent format
Simple, non-intrusive
As simple as adding jHiccup.jar as a java agent:
% java -javaagent=jHiccup.jar myApp myflags
or attaching jHiccup to a running process:
% jHiccup -p <pid>
Adds a background thread that samples time @ 1000/sec into an
HdrHistogram
Open Source. Released to the public domain
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Examples
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Drawn to scale
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Oracle HotSpot (pure newgen) Zing
Low latency trading application
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Oracle HotSpot (pure newgen) Zing
Low latency trading application
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Low latency - Drawn to scale
Oracle HotSpot (pure newgen) Zing
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Shameless bragging
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Zing
A JVM for Linux/x86 servers
ELIMINATES Garbage Collection as a concern for enterprise
applications
Very wide operating range: Used in both low latency and
large scale enterprise application spaces
Decouples scale metrics from response time concerns
Transaction rate, data set size, concurrent users, heap
size, allocation rate, mutation rate, etc.
Leverages elastic memory for resilient operation
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Zing in Low Latency Java
A full solution to GC-related issues
Allows you to code in Java instead of “Java”
You can actually use third party code, even if it allocates stuff
You can code in idiomatic Java without worries
Faster time-to-market
less duct-tape engineering
Not just about GC.
We look at anything that makes a JVM “hiccup”.
“ReadyNow!”: Addresses “Warmup” problems
E.g. deoptimization storms at market open
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Takeaways
Standard Deviation and application latency should never
show up on the same page...
If you haven’t stated percentiles and a Max, you haven’t
specified your requirements
Measuring throughput without latency behavior is [usually]
meaningless
Mistakes in measurement/analysis can cause orders-of-
magnitude errors and lead to bad business decisions
jHiccup and HdrHistogram are pretty useful
The Zing JVM is cool...
©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
Q & A
http://www.azul.com
http://www.jhiccup.com
https://giltene.github.com/HdrHistogram
https://github.com/LatencyUtils/LatencyUtils

Weitere ähnliche Inhalte

Andere mochten auch

Attitudes and job_satisfaction
Attitudes and job_satisfactionAttitudes and job_satisfaction
Attitudes and job_satisfactionAbhishek Bhalla
 
Attitude - Organizational Behaviour
Attitude - Organizational Behaviour Attitude - Organizational Behaviour
Attitude - Organizational Behaviour Dr. Rajasshrie Pillai
 
Conflict management1
Conflict management1Conflict management1
Conflict management1renudhawan
 
Chapter 3 attitudes & job satisfaction (2)
Chapter 3  attitudes & job satisfaction (2)Chapter 3  attitudes & job satisfaction (2)
Chapter 3 attitudes & job satisfaction (2)Qamar Farooq
 
Attitude & Job Satisfaction
Attitude & Job SatisfactionAttitude & Job Satisfaction
Attitude & Job SatisfactionSaravanan rulez
 
Job satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviourJob satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviourRajesh Gautham
 

Andere mochten auch (6)

Attitudes and job_satisfaction
Attitudes and job_satisfactionAttitudes and job_satisfaction
Attitudes and job_satisfaction
 
Attitude - Organizational Behaviour
Attitude - Organizational Behaviour Attitude - Organizational Behaviour
Attitude - Organizational Behaviour
 
Conflict management1
Conflict management1Conflict management1
Conflict management1
 
Chapter 3 attitudes & job satisfaction (2)
Chapter 3  attitudes & job satisfaction (2)Chapter 3  attitudes & job satisfaction (2)
Chapter 3 attitudes & job satisfaction (2)
 
Attitude & Job Satisfaction
Attitude & Job SatisfactionAttitude & Job Satisfaction
Attitude & Job Satisfaction
 
Job satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviourJob satisfaction in Organizational behaviour
Job satisfaction in Organizational behaviour
 

Ähnlich wie Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools

Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemsEnabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemszuluJDK
 
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsEnabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsC4Media
 
Understanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemUnderstanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemAzul Systems Inc.
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
Architectural Patterns of Resilient Distributed Systems
 Architectural Patterns of Resilient Distributed Systems Architectural Patterns of Resilient Distributed Systems
Architectural Patterns of Resilient Distributed SystemsInes Sombra
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Brian Troutwine
 
Understanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryC4Media
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xMatthew Gaudet
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...Puppet
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsJan Henry Nystrom
 
[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason Larsen[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason LarsenTI Safe
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Brian Brazil
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Misery Metrics & Consequences
Misery Metrics & ConsequencesMisery Metrics & Consequences
Misery Metrics & ConsequencesScyllaDB
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)Siglos
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionAzul Systems, Inc.
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesJonathan Creasy
 

Ähnlich wie Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools (20)

Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul SystemsEnabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
Enabling Java in Latency Sensitive Applications by Gil Tene, CTO, Azul Systems
 
Enabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive EnvironmentsEnabling Java in Latency Sensitive Environments
Enabling Java in Latency Sensitive Environments
 
Understanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemUnderstanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About Them
 
How NOT to Measure Latency
How NOT to Measure LatencyHow NOT to Measure Latency
How NOT to Measure Latency
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
Architectural Patterns of Resilient Distributed Systems
 Architectural Patterns of Resilient Distributed Systems Architectural Patterns of Resilient Distributed Systems
Architectural Patterns of Resilient Distributed Systems
 
Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014Monitoring Complex Systems - Chicago Erlang, 2014
Monitoring Complex Systems - Chicago Erlang, 2014
 
Understanding Hardware Transactional Memory
Understanding Hardware Transactional MemoryUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory
 
Ruby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3xRuby3x3: How are we going to measure 3x
Ruby3x3: How are we going to measure 3x
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
 
Automatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang ApplicationsAutomatic Assessment of Failure Recovery in Erlang Applications
Automatic Assessment of Failure Recovery in Erlang Applications
 
[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason Larsen[CLASS 2014] Palestra Técnica - Jason Larsen
[CLASS 2014] Palestra Técnica - Jason Larsen
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)Provisioning and Capacity Planning (Travel Meets Big Data)
Provisioning and Capacity Planning (Travel Meets Big Data)
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Misery Metrics & Consequences
Misery Metrics & ConsequencesMisery Metrics & Consequences
Misery Metrics & Consequences
 
How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)How to Monitoring the SRE Golden Signals (E-Book)
How to Monitoring the SRE Golden Signals (E-Book)
 
DC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage CollectionDC JUG: Understanding Java Garbage Collection
DC JUG: Understanding Java Garbage Collection
 
Normal accidents and outpatient surgeries
Normal accidents and outpatient surgeriesNormal accidents and outpatient surgeries
Normal accidents and outpatient surgeries
 

Mehr von Azul Systems Inc.

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Azul Systems Inc.
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Azul Systems Inc.
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?Azul Systems Inc.
 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapAzul Systems Inc.
 
Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market OpenAzul Systems Inc.
 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guideAzul Systems Inc.
 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage CollectionAzul Systems Inc.
 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014Azul Systems Inc.
 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingAzul Systems Inc.
 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Azul Systems Inc.
 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Azul Systems Inc.
 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMAzul Systems Inc.
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleAzul Systems Inc.
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data RacesAzul Systems Inc.
 
Lock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash TableLock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash TableAzul Systems Inc.
 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkAzul Systems Inc.
 
The Art of Java Benchmarking
The Art of Java BenchmarkingThe Art of Java Benchmarking
The Art of Java BenchmarkingAzul Systems Inc.
 
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Systems Inc.
 

Mehr von Azul Systems Inc. (20)

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
 
Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market Open
 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guide
 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage Collection
 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014
 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...
 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
 
Java vs. C/C++
Java vs. C/C++Java vs. C/C++
Java vs. C/C++
 
What's Inside a JVM?
What's Inside a JVM?What's Inside a JVM?
What's Inside a JVM?
 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVM
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding Style
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 
Lock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash TableLock-Free, Wait-Free Hash Table
Lock-Free, Wait-Free Hash Table
 
How NOT to Write a Microbenchmark
How NOT to Write a MicrobenchmarkHow NOT to Write a Microbenchmark
How NOT to Write a Microbenchmark
 
The Art of Java Benchmarking
The Art of Java BenchmarkingThe Art of Java Benchmarking
The Art of Java Benchmarking
 
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, PolandAzul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
Azul Zulu on Azure Overview -- OpenTech CEE Workshop, Warsaw, Poland
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools

  • 1. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Understanding Latency and Response Time Behavior Pitfalls, Lessons and Tools Matt Schuetze Director of Product Management Azul Systems
  • 2. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency Behavior Latency: The time it took one operation to happen Each operation occurrence has its own latency So we need to measure again, and again, and again... What we care about is how latency behaves Behavior is more than “what was the common case?”
  • 3. ©2014 Azul Systems, Inc. What do you care about? Do you : Care about latency in your system? Care about the worst case? Care about the 99.99%? Only care about the fastest thing in the day? Only care about the best 50% Only need 90% of operations to meet needs? Care if “only” 1% of operations are painfully slow? Care if 90% of users see outliers every hour?
  • 4. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency “wishful thinking” We know how to compute averages & std. deviation, etc. Wouldn’t it be nice if latency had a normal distribution? The average, 90%, 99%, std. deviation, etc. can give us a “feel” for the rest of the distribution, right? If 99% of the stuff behaves well, how bad can the rest be, really?
  • 5. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution
  • 6. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution
  • 7. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution 99% better than 0.5 msec 99.99% better than 5 msec
  • 8. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution
  • 9. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: latency distribution Mean = 0.06 msec Std. Deviation (𝞂) = 0.21msec 99.999% = 38.66msec ~184 σ (!!!) away from the mean In a normal distribution, the 99.999%’ile falls within 4.5 σ These are NOT normal distributions
  • 10. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The real world: “outliers” 99%‘ile is ~60 usec Max is ~30,000% higher than “typical”
  • 11. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. A classic look at response time behavior Response time as a function of load source: IBM CICS server documentation, “understanding response times” Average? Max? Median? 90%? 99.9%
  • 12. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Response time over time When we measure behavior over time, we often see: source: ZOHO QEngine White Paper: performance testing report analysis “Hiccups”
  • 13. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Hiccups are [typically] strongly multi-modal They don’t look anything like a normal distribution They usually look like periodic freezes A complete shift from one mode/behavior to another Mode A: “good”. Mode B: “Somewhat bad” Mode C: “terrible”, ... ....
  • 14. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Common ways people deal with hiccups
  • 15. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Common ways people deal with hiccups Averages and Standard Deviation Always Wrong!
  • 16. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Better ways people can deal with hiccups Actually characterizing latency Requirements Response Time Percentile plot line
  • 17. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Requirements Why we measure latency and response times to begin with...
  • 18. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency: Stating Requirements Requirements describe how latency should behave Useful Latency requirements are usually stated as a PASS/FAIL test against some predefined criteria Different applications have different needs Requirements should reflect application needs Measurements should provide data to evaluate requirements
  • 19. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Establishing Requirements an interactive interview (or thought) process Q: What are your latency requirements? A: We need an avg. response of 20 msec Q: Ok. Typical/average of 20 msec... So what is the worst case requirement? A: We don’t have one Q: So it’s ok for some things to take more than 5 hours? A: No way in H%%&! Q: So I’ll write down “5 hours worst case...” A: No. That’s not what I said. Make that “nothing worse than 100 msec” Q: Are you sure? Even if it’s only two times a day? A: Ok... Make it “nothing worse than 2 seconds...”
  • 20. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Establishing Requirements an interactive interview (or thought) process Ok. So we need a typical of 20msec, and a worst case of 2 seconds. How often is it ok to have a 1 second response? A: (Annoyed) I thought you said only a few times a day Q: That was for the worst case. But if half the results are better than 20 msec, is it ok for the other half to be just short of 2 seconds? What % of the time are you willing to take a 1 second, or a half second hiccup? Or some other level? A: Oh. Let’s see. We have to better than 50 msec 90% of the time, or we’ll be losing money even when we are fast the rest of the time. We need to be better than 500 msec 99.9% of the time, or our customers will complain and go elsewhere Now we have a service level expectation: 50% better than 20 msec 90% better than 50 msec 99.9% better than 500 msec 100% better than 2 seconds
  • 21. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency does not live in a vacuum
  • 22. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Remember this? How much load can this system handle? Where the sysadmin is willing to go What the marketing benchmarks will say Where users complain Sustainable Throughput Level
  • 23. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Comparing behavior under different throughputs and/or configurations
  • 24. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Latency behavior under different throughputs, configurations latency sensitive messaging distribution application
  • 25. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. The coordinated omission problem An accidental conspiracy...
  • 26. ©2014 Azul Systems, Inc. Coordinated Omission (“CO”) is the measurement error which is introduced by naively recording requests, sorting them and reporting the result as the percentile distribution of the request latency. Any recording method, synchronous or asynchronous, which results in even partially coordinating sample times with the system under test, and as a result avoids recording some of the originally intended samples will exhibit CO. Synchronous methods tend to naturally exhibit this sort of nativity when intended sample time are not kept far apart from each other. Synopsis of Coordinated Omission
  • 27. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Real World Coordinated Omission effects Uncorrected Data
  • 28. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Real World Coordinated Omission effects Uncorrected Data Corrected for Coordinated Omission
  • 29. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Real World Coordinated Omission effects (Why I care) A ~2500x difference in reported percentile levels for the problem that Zing eliminates Zing “other” JVM
  • 30. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Suggestions Whatever your measurement technique is, TEST IT. Run your measurement method against artificial systems that create hypothetical pauses scenarios. See if your reported results agree with how you would describe that system behavior Don’t waste time analyzing until you establish sanity Don’t EVER use or derive from std. deviation ALWAYS measure Max time. Consider what it means... Be suspicious. Measure %‘iles. Lots of them.
  • 31. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram
  • 32. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram If you want to be able to produce graphs like this... You need both good dynamic range and good resolution
  • 33. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram background Goal: Collect data for good latency characterization... Including acceptable precision at and between varying percentile levels Existing alternatives Record all data, analyze later (e.g. sort and get 99.9%‘ile). Record in traditional histograms Traditional Histograms: Linear bins, Logarithmic bins, or Arbitrary bins Linear requires lots of storage to cover range with good resolution Logarithmic covers wide range but has terrible precisions Arbitrary is.... arbitrary. Works only when you have a good feel for the interesting parts of the value range
  • 34. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. HdrHistogram A High Dynamic Range Histogram Covers a configurable dynamic value range At configurable precision (expressed as number of significant digits) For Example: Track values between 1 microsecond and 1 hour With 3 decimal points of resolution Built-in [optional] compensation for Coordinated Omission Open Source On github, released to the public domain, creative commons CC0
  • 35. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Plotting HdrHistogram output Convenient for plotting and comparing test results X-Axis Y-Axis
  • 36. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. jHiccup
  • 37. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. jHiccup A tool for capturing and displaying platform hiccups Records any observed non-continuity of the underlying platform Plots results in simple, consistent format Simple, non-intrusive As simple as adding jHiccup.jar as a java agent: % java -javaagent=jHiccup.jar myApp myflags or attaching jHiccup to a running process: % jHiccup -p <pid> Adds a background thread that samples time @ 1000/sec into an HdrHistogram Open Source. Released to the public domain
  • 38. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Examples
  • 39. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
  • 40. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
  • 41. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc.
  • 42. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Drawn to scale
  • 43. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Oracle HotSpot (pure newgen) Zing Low latency trading application
  • 44. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Oracle HotSpot (pure newgen) Zing Low latency trading application
  • 45. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Low latency - Drawn to scale Oracle HotSpot (pure newgen) Zing
  • 46. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Shameless bragging
  • 47. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Zing A JVM for Linux/x86 servers ELIMINATES Garbage Collection as a concern for enterprise applications Very wide operating range: Used in both low latency and large scale enterprise application spaces Decouples scale metrics from response time concerns Transaction rate, data set size, concurrent users, heap size, allocation rate, mutation rate, etc. Leverages elastic memory for resilient operation
  • 48. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Zing in Low Latency Java A full solution to GC-related issues Allows you to code in Java instead of “Java” You can actually use third party code, even if it allocates stuff You can code in idiomatic Java without worries Faster time-to-market less duct-tape engineering Not just about GC. We look at anything that makes a JVM “hiccup”. “ReadyNow!”: Addresses “Warmup” problems E.g. deoptimization storms at market open
  • 49. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Takeaways Standard Deviation and application latency should never show up on the same page... If you haven’t stated percentiles and a Max, you haven’t specified your requirements Measuring throughput without latency behavior is [usually] meaningless Mistakes in measurement/analysis can cause orders-of- magnitude errors and lead to bad business decisions jHiccup and HdrHistogram are pretty useful The Zing JVM is cool...
  • 50. ©2014 Azul Systems, Inc.©2013 Azul Systems, Inc. Q & A http://www.azul.com http://www.jhiccup.com https://giltene.github.com/HdrHistogram https://github.com/LatencyUtils/LatencyUtils