Comp5704-Final Presentation

COMP 5704 Project Presentation
Towards Using Smart Hill Climbing
Heuristic Search Strategy in
Hadoop/YARN Dynamic Parameter
Tuning
Ali Davoudian, Pablo Navarro
School of Computer Science
Carleton University, Ottawa, Canada

COMP 5704 Project Presentation – Slide 1
MapReduce Framework
Input
Chunk-2
Input
Chunk-1
Input
Chunk-3
Map#1
Map#2
Map#3
G11
G12
G21
G22
G31
G32
G11
G21
G31
G12
G22
G32
Red#1
Red#2
Output
Chunk-1
Output
Chunk-2
MapReduce Job

Hadoop/YARN Configuration Parameters
• Hadoop/YARN configuration parameters have significant
effect on the cost of MapReduce jobs.
io.sort.mb
Circular memory buffer
Collect
Sort and
spill to
disk Merge
Read of
HDFS

Configuration Parameter Tuning
• Which configuration gives the minimum MR job cost?
1. Manual-tuning
• Challenge: combinatorial explosion problem
2. Auto-tuning
I. Static
II. Dynamic
• Search-based methods

Static Parameter Tuning
Execute a
test run
With enabled
profilingInitial MR job
configuration
Profiling outputs
Performance
analyzer
MR job
configuration
 Time consuming
 Not cost-effective
 By changing the data
set or hardware, tests
should be repeated
MR job

Dynamic Parameter Tuning
Initial MR job
configuration
autotuner wave
execution Cost analyzer
Optimum or near
optimum configuration
Smart hill
climbing
exploration
MR job

Search-based Auto-tuning
• Define an objective function Y as a candidate for the cost of
MR job.
 E.g., average execution time of containers
• Assumption:
• Problem: What is the optimal configuration as it gives the
minimum or near to minimum amount of C
• Challenge: 𝒇is unknown or black-box

Heuristic Search Methods
I. Simulated annealing
 Uses the Metropolis Monte Carlo sampling strategy
 Guarantees a global optima
 Has a slow convergence to the solution
II. Recursive random search
 Uses the Recursive Random sampling strategy
 It may be inefficient, as restarts of the naïve random sampling may waste efforts
III. Hill climbing
 Uses the gradient-based sampling strategy
 It may get stuck at a local optima area
IV. Genetic algorithms
V. Particle swarm optimization

Smart Hill Climbing Exploration - SHC
Collect m sample configurations c1, . . . ,cm
in the whole configuration space S
From the obtained sample points & their costs determine a
reduced or changed subspace S’ which is most likely to
contain optimal or near-optimal configuration
Determine the optimal configuration with regard to
all the obtained sample points
Collect m sample configurations c1, . . . ,cm
in the whole configuration space S’
Restart
Focus

Bypassing Local Searches – Approach1
Reduces the overhead
But less noise resilient

Bypassing Local Searches – Approach2
More noise resilient but
increases the overhead

Weighted Latin HyperCube Sampling - wLHS
Determine K equi-sized non-overlapping intervals I1,…IK in the
space of each parameter Pi
Calculate the cost Y of each configuration
Determine the general trend and correlation of Y with
each configuration parameter Pi
Determine K equi-probability non-overlapping intervals I1,…IK in
the space of each parameter Pi regarding its PDF
Randomly select one parameter value from each interval

wLHS – Example
C2.p C3.pC1.p 𝒑
1/3
2/3
1
PDF(p)
A = 𝜡 𝟎 B = 𝜡 𝟑𝜡 𝟏 𝜡 𝟐
r3
r2
r1
I1 I2 I3
0
Assumptions:
• One dimensional configurations
• Parameter space: [A-B]
• Number of samples: 3

Implementation and Experiments
• The algorithms were implemented in java for one
dimensional configuration optimization outside of the
Hadoop/YARN context, to reduce complexity and make
testing easier.
• Experiments were conducted in artificial cost functions to
look for optimal configurations.

Weighted Latin Hypercube Sampling
• wLHS was tested to verify if the size of intervals would
shift after learning where good values (low values) are
found.
• A simple artificial cost function was used to do these
experiments.

Rosenbrock Function
0
20
40
60
80
100
120
140
160
180
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Rosenbrock function

Round 1 wLHS
0
20
40
60
80
100
120
140
160
180
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
cost(x)
x values
LHS round 1

Round 4 wHLS
0
20
40
60
80
100
120
140
160
180
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
cost(x)
x values
WHLS round 4

Intervals After 20 Rounds
-1000
-500
0
500
1000
1500
2000
2500
3000
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

SHC Experiments
• Smart hill climbing was implemented and tested using
more complex cost functions.
• Two different versions of SHC were implemented:
• SHC Original version(Approach 1).
• SHC MROnline version (Approach 2).
• The focus of the tests were:
• Finding the best parameters (Lowest cost).
• Number of cost function Executions.
• Noise Resilience.

SHC Used for a Complex Function
0
1
2
3
4
5
6
-3 -2 -1 0 1 2 3
Complex function
Function
Optima
Real Optima = 2.2615652875

Number of Average Cost Function Executions
0
20
40
60
80
100
120
140
160
180
200
Number of Cost Function Executions Average
SHC MROnline
SHCO 80% Acceptance
SHCO 20% Acceptance

The Effect of Noise in the Precision of SHC
0
50
100
150
200
250
0 10 20 30 40 50 60 70
Theproportionalaveragedistancetoglobaloptima
Percentage of noise
Gaussian
Uniform

DISTRIBUTION IN THE CURVE FOR 20%
GAUSSIAN NOISE
0
1
2
3
4
5
6
-3 -2 -1 0 1 2 3
Shape of the function
Results with no noise
Results with noise

GAUSSIAN NOISE
0
1
2
3
4
5
6
-3 -2 -1 0 1 2 3
Results with noise

Future Work
• Implementing SHC inside the Hadoop/YARN
environment
• Enhancing our current version to tune N dimensional
Hadoop/YARN configurations
• Including tuning rules into our tuning algorithms
• Assessing the feasibility of other heuristic search
algorithms such as MOWILE (More With Less)
heuristic search algorithm.

Questions
1. What does auto-tuning mean?
2. What is the dynamic auto-tuning technique?
3. Were our tests executed in the Hadoop environment or in
a simulation environment?
4. What kind of distributions are being used in our noise
generation?

Comp5704-Final Presentation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (17)

Ähnlich wie Comp5704-Final Presentation

Ähnlich wie Comp5704-Final Presentation (20)

Comp5704-Final Presentation