I gave this talk at the 2013 Meeting On Algorithm Engineering and Experiments (ALENEX) meeting.
Find my other talks and the corresponding papers on my web page:
http://wwwagak.cs.uni-kl.de/sebastian-wild.html
Scaling API-first – The story of a global engineering organization
Engineering Java 7's Dual Pivot Quicksort Using MaLiJAn
1. Engineering Java 7’s Dual Pivot Quicksort
Using MaLiJAn
Sebastian Wild Markus E. Nebel Raphael Reitzig Ulrich Laube
[wild, nebel, r_reitzi, laube] @cs.uni-kl.de
Computer Science Department
University of Kaiserslautern
January 7, 2013
Meeting on Algorithm Engineering & Experiments 2013
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 1 / 23
2. Background
Since Java 7: new dual pivot Quicksort in JRE library
Basic algorithm by Vladimir Yaroslavskiy
Optimizations by Jon Bentley, Joshua Bloch and others
(see java.core-libs.devel mailing list)
Motivated by experience with classic Quicksort
Validated by running time benchmark
In this talk:
Can we exploit special properties of dual pivot Quicksort?
Can we get more insight than running time measurements?
. . . stay tuned
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 2 / 23
3. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
p q
3 5 1 8 4 7 2 9 6
Select two elements as pivots.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
4. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
p q
3 5 1 8 4 7 2 9 6
Only value relative to pivot counts.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
5. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k
3 5 1 8 4 7 2 9 6
A[k] is medium go on
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
6. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k
3 5 1 8 4 7 2 9 6
A[k] is small Swap to left
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
7. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k
3 5 1 8 4 7 2 9 6
Swap small element to left end.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
8. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k
3 1 5 8 4 7 2 9 6
Swap small element to left end.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
9. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k
3 1 5 8 4 7 2 9 6
A[k] is large Find swap partner.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
10. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k g
3 1 5 8 4 7 2 9 6
A[k] is large Find swap partner:
g skips over large elements.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
11. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k g
3 1 5 8 4 7 2 9 6
A[k] is large Swap
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
12. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k g
3 1 5 2 4 7 8 9 6
A[k] is large Swap
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
13. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k g
3 1 5 2 4 7 8 9 6
A[k] is old A[g], small Swap to left
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
14. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k g
3 1 2 5 4 7 8 9 6
A[k] is old A[g], small Swap to left
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
15. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k g
3 1 2 5 4 7 8 9 6
A[k] is medium go on
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
16. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
k g
3 1 2 5 4 7 8 9 6
A[k] is large Find swap partner.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
17. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
g k
3 1 2 5 4 7 8 9 6
A[k] is large Find swap partner:
g skips over large elements.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
18. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
g k
3 1 2 5 4 7 8 9 6
g and k have crossed!
Swap pivots in place
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
19. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
g k
2 1 3 5 4 6 8 9 7
g and k have crossed!
Swap pivots in place
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
20. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
2 1 3 5 4 6 8 9 7
Partitioning done!
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
21. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
2 1 3 5 4 6 8 9 7
Recursively sort three sublists.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
22. Java 7’s Dual Pivot Quicksort – Example
Yaroslavskiy’s Dual Pivot Quicksort
(used in Oracle’s Java 7 Arrays.sort(int[]))
1 2 3 4 5 6 7 8 9
Done.
Invariant: <p p ◦ q k ? g >q
→ → ←
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 3 / 23
23. Control Flow Graph of Partitioning Loop
1 bc: 3 no
k g
yes
2 bc: 7
t := A[k]; 7 bc: 2
yes t<p g := g − 1;
yes
no
3 bc: 12
4 bc: 3 yes 5 bc: 5 yes 6 bc: 3
A[k] := A[ ];
A[ ] := t; t q A[g] > q k<g
:= + 1;
no
no no
8 bc: 5
A[g] < p
yes no
9 bc: 14 10 bc: 6
A[k] := A[ ]; A[k] := A[g]
A[ ] := A[g]
:= + 1;
11 bc: 5
12 bc: 2
A[g] := t;
k := k + 1
g := g − 1;
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 4 / 23
24. Control Flow Graph of Partitioning Loop
1 bc: 3 no
k g Cycle 1
yes
2 bc: 7
7 bc: 2
A[k]: small
t := A[k];
yes g := g − 1;
t<p
yes A[g]: —
no
3 bc: 12
4 bc: 3 yes 5 bc: 5 yes 6 bc: 3
A[k] := A[ ];
A[ ] := t;
:= + 1;
t q A[g] > q k<g
∆(g − k): 1
no
no no
8 bc: 5
A[g] < p Bytecode
yes no
Instructions: 24
9 bc: 14 10 bc: 6
A[k] := A[ ]; A[k] := A[g]
A[ ] := A[g]
:= + 1;
11 bc: 5
12 bc: 2
A[g] := t;
k := k + 1
g := g − 1;
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 4 / 23
25. Control Flow Graph of Partitioning Loop
1 bc: 3 no
k g Cycle 2
yes
2 bc: 7
7 bc: 2
A[k]: medium
t := A[k];
yes g := g − 1;
t<p
yes A[g]: —
no
3 bc: 12
4 bc: 3 yes 5 bc: 5 yes 6 bc: 3
A[k] := A[ ];
A[ ] := t;
:= + 1;
t q A[g] > q k<g
∆(g − k): 1
no
no no
8 bc: 5
A[g] < p Bytecode
yes no
Instructions: 15
9 bc: 14 10 bc: 6
A[k] := A[ ]; A[k] := A[g]
A[ ] := A[g]
:= + 1;
11 bc: 5
12 bc: 2
A[g] := t;
k := k + 1
g := g − 1;
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 4 / 23
26. Control Flow Graph of Partitioning Loop
1 bc: 3 no
k g Cycle 3
yes
2 bc: 7
7 bc: 2
A[k]: large
t := A[k];
yes g := g − 1;
t<p
yes A[g]: large
no
3 bc: 12
4 bc: 3 yes 5 bc: 5 yes 6 bc: 3
A[k] := A[ ];
A[ ] := t;
:= + 1;
t q A[g] > q k<g
∆(g − k): 1
no
no no
8 bc: 5
A[g] < p Bytecode
yes no
Instructions: 10
9 bc: 14 10 bc: 6
A[k] := A[ ]; A[k] := A[g]
A[ ] := A[g]
:= + 1;
11 bc: 5
12 bc: 2
A[g] := t;
k := k + 1
g := g − 1;
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 4 / 23
27. Control Flow Graph of Partitioning Loop
1 bc: 3 no
k g Cycle 4
yes
2 bc: 7
7 bc: 2
A[k]: large
t := A[k];
yes g := g − 1;
t<p
yes A[g]: small
no
3 bc: 12
4 bc: 3 yes 5 bc: 5 yes 6 bc: 3
A[k] := A[ ];
A[ ] := t;
:= + 1;
t q A[g] > q k<g
∆(g − k): 2
no
no no
8 bc: 5
A[g] < p Bytecode
yes no
Instructions: 44
9 bc: 14 10 bc: 6
A[k] := A[ ]; A[k] := A[g]
A[ ] := A[g]
:= + 1;
11 bc: 5
12 bc: 2
A[g] := t;
k := k + 1
g := g − 1;
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 4 / 23
28. Control Flow Graph of Partitioning Loop
1 bc: 3 no
k g Cycle 5
yes
2 bc: 7
7 bc: 2
A[k]: large
t := A[k];
yes g := g − 1;
t<p
yes A[g]: medium
no
3 bc: 12
4 bc: 3 yes 5 bc: 5 yes 6 bc: 3
A[k] := A[ ];
A[ ] := t;
:= + 1;
t q A[g] > q k<g
∆(g − k): 2
no
no no
8 bc: 5
A[g] < p Bytecode
yes no
Instructions: 36
9 bc: 14 10 bc: 6
A[k] := A[ ]; A[k] := A[g]
A[ ] := A[g]
:= + 1;
11 bc: 5
12 bc: 2
A[g] := t;
k := k + 1
g := g − 1;
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 4 / 23
29. Asymmetry
1 bc: 3 no
k g
2
yes
bc: 7
Algorithm is asymmetric:
t := A[k]; 7 bc: 2
yes t<p
no
g := g − 1;
yes Cycles have different cost
3 bc: 12
A[k] := A[ ]; 4 bc: 3 yes 5 bc: 5 yes 6 bc: 3 Would rather execute cheap
A[ ] := t; t q A[g] > q k<g
:= + 1;
no
no
no
ones often
8 bc: 5
yes
A[g] < p
no Cycles chosen by classes
9 bc: 14
A[k] := A[ ];
10 bc: 6
A[k] := A[g] small , medium or large
A[ ] := A[g]
:= + 1;
Probability for classes depends
12 bc: 2
k := k + 1
11 bc: 5
A[g] := t;
on pivot values
g := g − 1;
Maybe we can “influence pivot values accordingly”?
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 5 / 23
30. Pivot Sampling
Well-known optimization for classic Quicksort: median-of-three
pivot closer to median of whole list
In JRE7 Quicksort implementation: natural extension for 2 pivots:
tertiles-of-five
pivots closer to tertiles of whole list
9 other possibilities to pick p and q out of 5 elements:
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 6 / 23
31. Pivot Sampling
Well-known optimization for classic Quicksort: median-of-three
pivot closer to median of whole list
In JRE7 Quicksort implementation: natural extension for 2 pivots:
tertiles-of-five
pivots closer to tertiles of whole list
9 other possibilities to pick p and q out of 5 elements:
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 6 / 23
32. Pivot Sampling
Well-known optimization for classic Quicksort: median-of-three
pivot closer to median of whole list
In JRE7 Quicksort implementation: natural extension for 2 pivots:
p q
tertiles-of-five
pivots closer to tertiles of whole list
9 other possibilities to pick p and q out of 5 elements:
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 6 / 23
33. Pivot Sampling
Well-known optimization for classic Quicksort: median-of-three
pivot closer to median of whole list
In JRE7 Quicksort implementation: natural extension for 2 pivots:
p q
tertiles-of-five
pivots closer to tertiles of whole list
9 other possibilities to pick p and q out of 5 elements:
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 6 / 23
34. Optimizing Pivot Sampling
Which are “good” pivot selection schemes?
Is the symmetric choice best possible?
Need objective function to optimize
Typical approaches to judge efficiency:
A Count number of basic operations.
(Here: number of executed Java Bytecode instructions.)
B Measure total running time.
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 7 / 23
35. Optimizing Pivot Sampling
Relative performance of pivot sampling compared to tertiles-of-five:
Pivot Selection Scheme A 1 B 2
JRE7
+5.14% +0.80%
JRE7(1,3) −1.85% −0.44%
+3.34% −0.42%
— (stack overflow!) +10.6%
+2.48% +2.73%
+11.3% +3.31%
+12.7% +3.29%
+16.4% +2.48%
+39.0% +5.87%
1
Average number of executed bytecodes on almost sorted lists of length 105 .
2
Average running time on random permutations of length 106 .
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 8 / 23
37. Model and Method
What made JRE7(1,3) faster than JRE7 ?
. . . hard to tell from total time/bytecodes.
Need a more detailed model of the program.
Idea: Decompose along control flow graph!
1
View program as Markov chain over blocks
2 7
Termination via absorbing state
3 4 5 6
Transition i → j has probability p(n)
i→j
8 depending on input size n
9 10 Visiting block i incurs constant costs c(i)
12 11 Total cost is sum of block costs
Expected costs of program = expected costs of run of Markov chain
Latter easy to compute
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 9 / 23
38. Model and Method
What made JRE7(1,3) faster than JRE7 ?
. . . hard to tell from total time/bytecodes.
Need a more detailed model of the program.
Idea: Decompose along control flow graph!
1
View program as Markov chain over blocks
2 7
Termination via absorbing state
3 4 5 6
Transition i → j has probability p(n)
i→j
8 depending on input size n
9 10 Visiting block i incurs constant costs c(i)
12 11 Total cost is sum of block costs
Expected costs of program = expected costs of run of Markov chain
Latter easy to compute
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 9 / 23
39. Maximum Likelihood Analysis
How to determine block costs and transition probabilities?
Transition Probabilities
Count transitions in executions on sample data
1 Allows arbitrary input distributions!
2 Take relative frequency as estimate for p(n)
i→j
Extrapolate p(n) to a function pi→j (n) in n
i→j
Block Costs
We consider two cost measures:
1
A bc(i) = number of Bytecodes instructions in block i.
2
B t(i) = running time of block i
All steps are automated in our tool MaLiJAn3
3
http://wwwagak.cs.uni-kl.de/malijan.html
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 10 / 23
40. Block Sampling
Running times t(i) in B are typically few nanoseconds
direct measurement not possible.
Idea: Sampling Based Approach
12 11 12
ns
1 2 3 1 2 4 5 6 7 5 6 7 5 6 7 8 10 1
time µs
sampling 3 2 6 5 5 8 10
In regular intervals, store current basic block (concurrently)
We observe only ≈ 1 of all blocks repeat execution
Relative frequencies of observed samples approach
relative running time contribution of blocks.
Count in separate run how often block i gets executed in total
Together, this allows to compute t(i)
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 11 / 23
41. A Decent Word of Caution
1 Determining current block adds a small systematic error.
2 Java Specialty: Just-in-time Compilation
Running time heavily influenced by HotSpot JIT compiler
JIT collects profiling information at beginning
First input determines which optimizations are found
. . . more details in the paper
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 12 / 23
42. Input Distributions
We consider 2 different input distributions:
1 Random Permutations
well-studied in literature
2 Almost Sorted Lists
Random model by Brodal et al.4 :
A[i] chosen i. i. d. uniform in [i − d, i + d]
for constant d (here d = 100)
4
G. Brodal, R. Fagerberg, G. Moruz: On the Adaptiveness of Quicksort,
J. Exp. Algorithmics 12 (2008), pp. 3.2:1–3.2:20
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 13 / 23
44. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n
JRE7
time -Xcomp B
JRE7(1,3)
JRE7
time warmup B
JRE7(1,3)
24 log. plot, normalized by n ln n
JRE7, JRE7(1,3)
23 model fits data well!
22
105 106 107 108
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 14 / 23
45. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n
JRE7
time -Xcomp B
JRE7(1,3) 19.40 n ln n + 51 n
18.73 n ln n + 62 n
24 JRE7
time warmup B JRE7
JRE7(1,3)
JRE7(1,3)
n ln n
bc
24 23 log. plot, normalized by n ln n
JRE7, JRE7(1,3)
23 model fits data well!
22
22
105 106 107 108
105 106 107 108 n
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 14 / 23
46. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7
time -Xcomp B
JRE7(1,3)
JRE7
time warmup B
JRE7(1,3)
21
log. plot, normalized by n ln n
20 JRE7, JRE7(1,3)
model fits data well!
19
18
105 106 107 108
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 14 / 23
47. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7
time -Xcomp B
JRE7(1,3)
JRE7
time warmup B
JRE7(1,3)
asymptotically, JRE7(1,3) executes less Bytecodes!
Can we explain, why?
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 14 / 23
51. Asymptotic Cycle Frequencies
· n ln n + O(n)
0.4
JRE7(1,3) executes
Cycle 3 more often
0.2
Cycle 1 less often
than JRE7
0
JRE7 JRE7(1,3) JRE7 JRE7(1,3)
JRE7(1,3)
random permutations executes cheap Cycle 3 more often
almost sorted
and expensive Cycle 1 less often than JRE7.
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Asymptotically, less executed Bytecodes!
1 1 1 1 1
2 7 2 7 2 7 2 7 2 7
3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6
8 8 8 8 8
9 10 9 10 9 10 9 10 9 10
12 11 12 11 12 11 12 11 12 11
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 16 / 23
52. Running Time Results
How about running time?
HotSpot JIT compiler has two modes
-Xcomp JIT compiler without profiling information
warmup profiling JIT with warmup on fixed input
trigger JIT compilation
Do Block Sampling for both modes
Should we expect same block running times?
. . . stay tuned
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 17 / 23
55. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7 20.10 n ln n + 26 n 11.95 n ln n + 54 n
time -Xcomp B
JRE7(1,3) 19.95 n ln n + 32 n 11.09 n ln n + 64 n
JRE7
time warmup B
JRE7(1,3)
18
24 17
16
22
15
20 14
105 106 107 108 105 106 107 108
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 19 / 23
56. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7 20.10 n ln n + 26 n 11.95 n ln n + 54 n
time -Xcomp B
JRE7(1,3) 19.95 n ln n + 32 n 11.09 n ln n + 64 n
JRE7
time warmup B
JRE7(1,3)
18
24 17 JIT without profiling
16
22
15 asymptotically, JRE7(1,3) faster!
20 14
105 106 107 108 105 106 107 108
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 19 / 23
57. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7 20.10 n ln n + 26 n 11.95 n ln n + 54 n
time -Xcomp B
JRE7(1,3) 19.95 n ln n + 32 n 11.09 n ln n + 64 n
JRE7 10.02 n ln n + 9 n 5.52 n ln n + 13 n
time warmup B
JRE7(1,3) 11.39 n ln n + 15 n 5.38 n ln n + 19 n
8
12
6
10
4
105 106 107 108 105 106 107 108
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 19 / 23
58. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7 20.10 n ln n + 26 n 11.95 n ln n + 54 n
time -Xcomp B
JRE7(1,3) 19.95 n ln n + 32 n 11.09 n ln n + 64 n
JRE7 10.02 n ln n + 9 n 5.52 n ln n + 13 n
time warmup B
JRE7(1,3) 11.39 n ln n + 15 n 5.38 n ln n + 19 n
8
12 JIT with profiling and warmup
6
10
asymptotically, JRE7(1,3) slower!
4
105 106 107 108 105 106 107 108
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 19 / 23
59. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7 20.10 n ln n + 26 n 11.95 n ln n + 54 n
time -Xcomp B
JRE7(1,3) 19.95 n ln n + 32 n 11.09 n ln n + 64 n
JRE7 10.02 n ln n + 9 n 5.52 n ln n + 13 n
time warmup B
JRE7(1,3) 11.39 n ln n + 15 n 5.38 n ln n + 19 n
8
12 JIT with profiling and warmup
6
10
asymptotically, JRE7(1,3) slower!
4
105 106 107 108 105 106 107 108
What changes with profiling enabled?
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 19 / 23
62. Cycle Costs
· cost(Cycle 5)
1
measures agree
qualitatively
0.5
except for JRE7(1,3)
with profiling JIT!
0
bc tJRE7 tJRE7 tJRE7 tJRE7
For -Xcomp (1,3), the code created by profiling JIT
JRE7(1,3) (1,3)
with warmup
for Cycle 3 is much slower than for JRE7!
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
That’s the place to focus future research on.
1 1 1 1 1
2 7 2 7 2 7 2 7 2 7
3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6
8 8 8 8 8
9 10 9 10 9 10 9 10 9 10
12 11 12 11 12 11 12 11 12 11
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 20 / 23
63. Cycle Costs
· cost(Cycle 5)
1
measures agree
qualitatively
0.5
except for JRE7(1,3)
with profiling JIT!
0
bc tJRE7 tJRE7 tJRE7 tJRE7
For -Xcomp (1,3), the code created by profiling JIT
JRE7(1,3) (1,3)
with warmup
for Cycle 3 is much slower than for JRE7!
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
That’s the place to focus future research on.
1 1 1 1 1
2 7 2 7 2 7 2 7 2 7
3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6 3 4 5 6
8 8 8 8 8
9 10 9 10 9 10 9 10 9 10
12 11 12 11 12 11 12 11 12 11
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 20 / 23
64. Conclusion
Summary
Java 7’s dual pivot Quicksort is highly asymmetric.
executes less Bytecodes than .
Almost sorted inputs amplify impact of pivot sampling.
Oracle’s profiling JIT compiler creates different code for JRE7(1,3) ,
which potentially overcompensates gains.
Control flow graph decomposition supported by MaLiJAn makes
difference in code efficiency directly visible.
Open Problems
? What causes different costs for Cycle 3?
? Are the differences idiosyncracies of Java / Oracle’s JRE?
? Performance of JRE7(1,3) on other inputs, especially with equal keys?
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 21 / 23
65. Conclusion
Summary
Java 7’s dual pivot Quicksort is highly asymmetric.
executes less Bytecodes than .
Almost sorted inputs amplify impact of pivot sampling.
Oracle’s profiling JIT compiler creates different code for JRE7(1,3) ,
which potentially overcompensates gains.
Control flow graph decomposition supported by MaLiJAn makes
difference in code efficiency directly visible.
Open Problems
? What causes different costs for Cycle 3?
? Are the differences idiosyncracies of Java / Oracle’s JRE?
? Performance of JRE7(1,3) on other inputs, especially with equal keys?
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 21 / 23
66. Asymptotic Expected Costs
Measure Algorithm Random Permutations Almost Sorted Lists
JRE7 19.40 n ln n + 51 n 15.10 n ln n + 68 n
Bytecodes A
JRE7(1,3) 18.73 n ln n + 62 n 13.52 n ln n + 85 n
JRE7 20.10 n ln n + 26 n 11.95 n ln n + 54 n
time -Xcomp B
JRE7(1,3) 19.95 n ln n + 32 n 11.09 n ln n + 64 n
JRE7 10.02 n ln n + 9 n 5.52 n ln n + 13 n
time warmup B
JRE7(1,3) 11.39 n ln n + 15 n 5.38 n ln n + 19 n
8
12 JIT with profiling and warmup
6
10
asymptotically, JRE7(1,3) slower!
4
105 106 107 108 105 106 107 108
What changes with profiling enabled?
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 22 / 23
67. Conclusion
Summary
Java 7’s dual pivot Quicksort is highly asymmetric.
executes less Bytecodes than .
Almost sorted inputs amplify impact of pivot sampling.
Oracle’s profiling JIT compiler creates different code for JRE7(1,3) ,
which potentially overcompensates gains.
Control flow graph decomposition supported by MaLiJAn makes
difference in code efficiency directly visible.
Open Problems
? What causes different costs for Cycle 3?
? Are the differences idiosyncracies of Java / Oracle’s JRE?
? Performance of JRE7(1,3) on other inputs, especially with equal keys?
Sebastian Wild Java 7’s Dual Pivot Quicksort 2012/09/11 23 / 23