This presentation was delivered by Tom Kleingarn at HP Software Universe 2010 in Washington DC. It describes basic statistical tests that can be applied to any performance engineering practice to improve accuracy and confidence in your test results.
3. About Me
> Tom Kleingarn
> Lead, Performance Engineering - Digital River
> 4 years in performance engineering
> Tested over 100 systems/applications
> 100’s of performance tests
> Tools
> LoadRunner
> JMeter
> Webmetrics, Keynote, Gomez
> ‘R’ and Excel
> Quality Center
> QuickTest Professional
4. > Leading provider of global e-commerce solutions
> Builds and manages online businesses for software and game
publishers, consumer electronics manufacturers, distributors,
online retailers and affiliates.
> Comprehensive platform offers
>
>
>
>
>
>
>
>
Site development and hosting
Order management
Fraud management
Export control
Tax management
Physical and digital product fulfillment
Multi-lingual customer service
Advanced reporting and strategic marketing
5. Performance Engineering
> The process of experimental design, test execution, and
results analysis, utilized to validate system performance as
part of the Software Development Lifecycle (SDLC).
> Performance requirements – measureable targets of speed,
reliability, and/or capacity used in performance validation.
> Latency < 10ms, measured at the 99th percentile
> 99.95% uptime
> Throughput of 1,000 requests per second
6. Performance Testing Cycle
1. Requirements Analysis
2. Create test plan
3. Create automated scripts
4. Define workload model
5. Execute scenarios
6. Analyze results
>
Rinse and repeat if…
> Defects identified
> Change in requirements
> Setup or environment issues
> Performance requirement not met
Digital River Test Automation
7. Agile
> A software development paradigm that emphasizes rapid
process cycles, cross-functional teams, frequent
examination of progress, and adaptability.
Initial Plan
Scrum
Deploy
8. Agile Performance Engineering
> Clear and constant communication
> Involvement in initial requirements and design phase
> Identify key business processes before they are built
> Coordinate with analysts and development to build key
business processes first
> Integrate load generation requirements into project schedule
> Test immediately with v1.0
> Schedule tests to auto-start, run independently
> Identify invalid test results before deep analysis
9. LoadRunner Results
> Measures of central tendency
> Average = ∑(all samples)/(sample size) =
> Median = 50th percentile
> Mode – highest frequency, the value that occurred the most
> Measures of variability
> Min, max
> Standard Deviation =
> 90th percentile
11. Basic Statistics – Sample vs. Population
> Performance requirement: average latency < 3 seconds
> What if you ran 50 rounds? 100 rounds?
12. Basic Statistics – Sample vs. Population
> Sample – set of values, subset of population
> Population – all potentially observable values
> Measurements
> Statistic – the estimated value from a collection of samples
> Parameter – the “true” value you are attempting to estimate
Not a representative
sample!
13. Basic Statistics – Sample vs. Population
> Sampling distribution – the probability distribution of a given
statistic based on a random sample of size n
> Dependent on the underlying population
>
How do you know the system under test met the performance requirement?
14. Basic Statistics – Normal Distribution
> With larger samples, data tend to cluster around the mean
15. Basic Statistics – Normal Distribution
Sir Francis Galton’s “Bean Machine”
16. Confidence Intervals
> The probability that an interval made up of two endpoints
will contain the true mean parameter μ
>
95% confidence interval:
>
… where 1.96 is a score from the normal distribution associated with 95% probability:
17. Confidence Intervals
> In repeated rounds of testing, a confidence interval will contain the
true mean parameter with a certain probability:
True Average
18. Confidence Intervals in Excel
Statistic
Value 95%
Value 99%
Formula
Average
3.40
3.40
Standard Deviation
1.45
1.45
Sample size
500
500
Confidence Level
0.95
0.99
Significance Level
0.05
0.01
0.0127
0.167
=CONFIDENCE(Sig. Level, Std Dev, Sample Size)
Lower Bound
3.273
3.233
=Average - Margin of Error
Upper Bound
3.527
3.567
=Average + Margin of Error
Margin of Error
=1-(Confidence Level)
>
95% confidence - true average latency 3.273 to 3.527 seconds
>
99% confidence - true average latency 3.233 to 3.567 seconds
>
Our range is wider at 99% compared to 95%, 0.334 sec vs. 0.254 sec
19. The T-test
> Test that your sample mean is
greater than/less than a certain
value
> Performance requirement:
Mean latency < 3 seconds
> Null hypothesis:
Mean latency >= 3 seconds
> Alternative hypothesis:
Mean latency is < 3 seconds
Add pic
21. T-test in ‘R’
> ‘R’ for statistical analysis
> http://www.r-project.org/
Load test data from a file:
> datafile <- read.table("C:Datatest.data",
header = FALSE, col.names= c("latency"))
Attach the dataframe:
> attach(datafile)
Create a “vector” from the dataframe:
> latency <- datafile$latency
22. T.Test in ‘R’
> t.test(latency, alternative="less", mu=3, tails=1)
One Sample t-test
data:
latency
t = -2.9968, df = 499, p-value = 0.001432
alternative hypothesis: true mean is less than 3
> There is a 0.14% probability that the true average latency of the
system is greater than 3 seconds. In this case we would reject
the null hypothesis.
> There is a 99.86% probability that the true average latency is
less than 3 seconds
23. T-test – Number of Samples Required
> power.t.test(sd=sd, sig.level=0.05, power=0.90,
delta=mean(latency)*0.01, type="one.sample")
One-sample t test power calculation
n = 215.5319
delta = 0.03241267
sd = 0.1461401
sig.level = 0.05
power = 0.9
alternative = two.sided
> We need at least 216 samples
> Our sample size is 500, we have enough samples to proceed
24. Test for Normality
> Test that the data is “normal”
> Clustered around a central value, no outliers
> Roughly fits the normal distribution
> shapiro.test(latency)
Shapiro-Wilk normality test
data:
latency
p-value = 0.8943
> Our sample distribution is approximately normal
> p-value < 0.05 indicates the distribution is not normal
25. Review
> Sample vs. Population
> Normal distribution
> Confidence intervals
> T-test
> Sample size
> Test for normality
> Practical application
> Performance requirements
> Compare two code builds
> Compare system infrastructure changes
26. Case Study
> Engaged in a new web service project
> Average latency < 25ms
> Applied statistical analysis
> System did not meet requirement
> Identified problem transaction
> Development fix applied
> Additional test, requirement met
> Prevented a failure in production
27. Implementation in Agile Projects
> Involvement in early design stages
> Identify performance requirements
> Build key business processes first
> Calculate required sample size
> Apply statistical analysis
> Run fewer tests with greater confidence in your results
> Prevent performance defects from entering production
> Prevent SLA violations in production
Hinweis der Redaktion
The t-statistic was introduced in 1908 by William Sealy Gosset, a chemist working for the Guinness brewery in Dublin, Ireland ("Student" was his pen name).[1][2][3] Gosset had been hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes.[2] Gosset devised the t-test as a way to cheaply monitor the quality of stout. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset's identity was unknown to fellow statisticians.