1. COMP 5704 Project Presentation
Towards Using Smart Hill Climbing
Heuristic Search Strategy in
Hadoop/YARN Dynamic Parameter
Tuning
Ali Davoudian, Pablo Navarro
School of Computer Science
Carleton University, Ottawa, Canada
3. COMP 5704 Project Presentation – Slide 2
Hadoop/YARN Configuration Parameters
• Hadoop/YARN configuration parameters have significant
effect on the cost of MapReduce jobs.
io.sort.mb
Circular memory buffer
Collect
Sort and
spill to
disk Merge
Read of
HDFS
4. COMP 5704 Project Presentation – Slide 3
Configuration Parameter Tuning
• Which configuration gives the minimum MR job cost?
1. Manual-tuning
• Challenge: combinatorial explosion problem
2. Auto-tuning
I. Static
II. Dynamic
• Search-based methods
5. COMP 5704 Project Presentation – Slide 4
Static Parameter Tuning
Execute a
test run
With enabled
profilingInitial MR job
configuration
Profiling outputs
Performance
analyzer
MR job
configuration
Time consuming
Not cost-effective
By changing the data
set or hardware, tests
should be repeated
MR job
7. COMP 5704 Project Presentation – Slide 6
Search-based Auto-tuning
• Define an objective function Y as a candidate for the cost of
MR job.
E.g., average execution time of containers
• Assumption:
• Problem: What is the optimal configuration as it gives the
minimum or near to minimum amount of C
• Challenge: 𝒇is unknown or black-box
8. COMP 5704 Project Presentation – Slide 7
Heuristic Search Methods
I. Simulated annealing
Uses the Metropolis Monte Carlo sampling strategy
Guarantees a global optima
Has a slow convergence to the solution
II. Recursive random search
Uses the Recursive Random sampling strategy
It may be inefficient, as restarts of the naïve random sampling may waste efforts
III. Hill climbing
Uses the gradient-based sampling strategy
It may get stuck at a local optima area
IV. Genetic algorithms
V. Particle swarm optimization
9. COMP 5704 Project Presentation – Slide 8
Smart Hill Climbing Exploration - SHC
Collect m sample configurations c1, . . . ,cm
in the whole configuration space S
From the obtained sample points & their costs determine a
reduced or changed subspace S’ which is most likely to
contain optimal or near-optimal configuration
Determine the optimal configuration with regard to
all the obtained sample points
Collect m sample configurations c1, . . . ,cm
in the whole configuration space S’
Restart
Focus
10. COMP 5704 Project Presentation – Slide 9
Bypassing Local Searches – Approach1
Reduces the overhead
But less noise resilient
11. COMP 5704 Project Presentation – Slide 10
Bypassing Local Searches – Approach2
More noise resilient but
increases the overhead
12. COMP 5704 Project Presentation – Slide 11
Weighted Latin HyperCube Sampling - wLHS
Determine K equi-sized non-overlapping intervals I1,…IK in the
space of each parameter Pi
Calculate the cost Y of each configuration
Determine the general trend and correlation of Y with
each configuration parameter Pi
Determine K equi-probability non-overlapping intervals I1,…IK in
the space of each parameter Pi regarding its PDF
Randomly select one parameter value from each interval
13. COMP 5704 Project Presentation – Slide 12
wLHS – Example
C2.p C3.pC1.p 𝒑
1/3
2/3
1
PDF(p)
A = 𝜡 𝟎 B = 𝜡 𝟑𝜡 𝟏 𝜡 𝟐
r3
r2
r1
I1 I2 I3
0
Assumptions:
• One dimensional configurations
• Parameter space: [A-B]
• Number of samples: 3
14. COMP 5704 Project Presentation – Slide 13
Implementation and Experiments
• The algorithms were implemented in java for one
dimensional configuration optimization outside of the
Hadoop/YARN context, to reduce complexity and make
testing easier.
• Experiments were conducted in artificial cost functions to
look for optimal configurations.
15. COMP 5704 Project Presentation – Slide 14
Weighted Latin Hypercube Sampling
• wLHS was tested to verify if the size of intervals would
shift after learning where good values (low values) are
found.
• A simple artificial cost function was used to do these
experiments.
20. COMP 5704 Project Presentation – Slide 19
SHC Experiments
• Smart hill climbing was implemented and tested using
more complex cost functions.
• Two different versions of SHC were implemented:
• SHC Original version(Approach 1).
• SHC MROnline version (Approach 2).
• The focus of the tests were:
• Finding the best parameters (Lowest cost).
• Number of cost function Executions.
• Noise Resilience.
21. COMP 5704 Project Presentation – Slide 20
SHC Used for a Complex Function
0
1
2
3
4
5
6
-3 -2 -1 0 1 2 3
Complex function
Function
Optima
Real Optima = 2.2615652875
22. COMP 5704 Project Presentation – Slide 21
Number of Average Cost Function Executions
0
20
40
60
80
100
120
140
160
180
200
Number of Cost Function Executions Average
SHC MROnline
SHCO 80% Acceptance
SHCO 20% Acceptance
23. COMP 5704 Project Presentation – Slide 22
The Effect of Noise in the Precision of SHC
0
50
100
150
200
250
0 10 20 30 40 50 60 70
Theproportionalaveragedistancetoglobaloptima
Percentage of noise
Gaussian
Uniform
24. COMP 5704 Project Presentation – Slide 23
DISTRIBUTION IN THE CURVE FOR 20%
GAUSSIAN NOISE
0
1
2
3
4
5
6
-3 -2 -1 0 1 2 3
Shape of the function
Results with no noise
Results with noise
25. COMP 5704 Project Presentation – Slide 24
DISTRIBUTION IN THE CURVE FOR 40%
GAUSSIAN NOISE
0
1
2
3
4
5
6
-3 -2 -1 0 1 2 3
Shape of the function
Results with no noise
Results with noise
26. COMP 5704 Project Presentation – Slide 25
DISTRIBUTION IN THE CURVE FOR 60%
GAUSSIAN NOISE
0
1
2
3
4
5
6
-3 -2 -1 0 1 2 3
Shape of the function
Results with no noise
Results with noise
27. COMP 5704 Project Presentation – Slide 26
Future Work
• Implementing SHC inside the Hadoop/YARN
environment
• Enhancing our current version to tune N dimensional
Hadoop/YARN configurations
• Including tuning rules into our tuning algorithms
• Assessing the feasibility of other heuristic search
algorithms such as MOWILE (More With Less)
heuristic search algorithm.
28. COMP 5704 Project Presentation – Slide 27
Questions
1. What does auto-tuning mean?
2. What is the dynamic auto-tuning technique?
3. Were our tests executed in the Hadoop environment or in
a simulation environment?
4. What kind of distributions are being used in our noise
generation?