As cloud services continue to grow rapidly, the need for more efficient utilization of resources and better optimized distribution of workload emerges. Our aim is to provide a solution to this optimization problem from three perspectives: (1) Infrastructure level; (2) Application level; and (3) Platform level.
In infrastructure level, the problem is to distribute virtual machines (VMs) to the datacenters (DCs) with geographical locations in such a way that latency is minimized while WAN bandwidth and datacenter capacity limits are respected. We plan to model latency as a function of DC load, inter-DC communication and proximity to user. Both VM requests and cloud infrastructure can be represented by graphs where vertices correspond to machines (either physical or virtual) and edges correspond to network connections between them. Our approach will employ weighted graph similarity and subgraph matching to suggest an efficient placement or the list of migrations to reach an efficient placement.
VM request is an input to our optimization in infrastructure level but the requested number and capacity of VMs as well as their inter-connections may not be optimal for a specific cloud service. In application level, on the other hand, we plan to deal with the configuration of the MapReduce programming model (More specifically one of its implementations: Apache Hadoop) especially in terms of number of maps and reduces. Previous work has shown that the optimum configuration depends on the resource consumption of the cloud service and it can be determined by executing a fraction to generate a signature. Our aim is to determine the number of maps and reduces statically by analyzing the design model, source code and/or the input data.
Furthermore, in platform level we aim to analyze the shuffle and scheduling algorithms of the MapReduce system to achieve a better load balancing and less amount communication between nodes. Shuffle is the phase where the output of all map processors with the same key are assigned to the same reduce processor. We will carry out a cost based formal analysis of the existing Hadoop schedulers to determine which scheduler is more appropriate for given costs of map, reduce and shuffle.
In summary, a holistic approach will be carried out to improve the performance of cloud software by optimizing resource allocation. Intended optimizations are on infrastructure level (Resource Selection and Assignment), application level (Map Reduce Configuration) and platform level (Work Distribution and Load Balancing). Cloud environment and resource allocation problem will be modeled based on graphs and automata, and the solutions will be within this context as well.
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Proposal]
1. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Modeling and Optimization of Resource Allocation in Cloud
PhD Thesis Proposal
Atakan Aral
Thesis Advisor: Asst. Prof. Dr. Tolga Ovatman
Istanbul Technical University – Department of Computer Engineering
June 25, 2014
1 / 40
2. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Outline
1 Introduction
2 Main Topics of the Thesis
3 Methods and Techniques
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
4 Time Plan
5 Conclusion
2 / 40
3. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Cloud Computing
Definition
Applications and services that run on a distributed network using virtualized
resources and accessed by common Internet protocols and networking standards.
Broad network access: Platform-independent, via standard methods
Measured service: Pay-per-use, e.g. amount of storage/processing power,
number of transactions, bandwidth etc.
On-demand self-service: No need to contact provider to provision resources
Rapid elasticity: Automatic scale up/out, illusion of infinite resources
Resource pooling: Abstraction, virtualization, multi-tenancy
4 / 40
4. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Cloud Computing Roots
Cloud computing paradigm is revolutionary, however the technology it is built
on is only evolutionary.
5 / 40
5. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Cloud Computing Benefits
Lower costs
Ease of utilization
Quality of Service
Reliability
Outsourced IT management
Simplified maintenance and upgrade
Lower barrier to entry
6 / 40
6. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Cloud Computing Architecture
7 / 40
7. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce
Definition
A programming model for processing large data sets with a parallel, distributed
algorithm on a cluster.
8 / 40
8. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Entities
DataNode: Stores blocks of data in distributed filesystem (HDFS).
NameNode: Holds metadata (i.e. location information) of the files in HDFS.
Jobtracker: Coordinates the MapReduce job.
Tasktracker: Runs the tasks that the MapReduce job split into.
9 / 40
9. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Life Cycle of Cloud Software
Development
of Distributed
Software
• MapReduce
Configuration
Resource
Allocation
• Resource
Selection and
Optimization
Load
Balancing
• Work
Distribution
to Resources
11 / 40
10. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Motivation
The cost of using 1000 machines for 1 hour, is the same as using 1 machine for
1000 hours in the cloud paradigm (Cost associativity).
Optimum number of maps and reduces that maximize resource utilization are
dependent on the resource consumption profile of the cloud software.
Bottleneck resources should be identified.
Optimization at Application level
12 / 40
11. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Resource Selection and Optimization
Motivation
In distributed computing environments, up to 85 percent of computing capacity
remains idle mainly due to poor optimization of placement.
Better assignment of virtual nodes to physical nodes may result in more
efficient use of resources.
There are several possible constraints to consider / optimize e.g. capacity
limits, proximity to user, latency etc.
Optimization at Infrastructure level
13 / 40
12. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Work Distribution to Resources
Motivation
Data flows between nodes only in the shuffle step of the MapReduce job, and
tuning it can have a big impact on job execution time.
Mapper and reducer nodes should be selected carefully to minimize network
traffic.
Dynamic load balancing should be ensured.
Optimization at Platform level
14 / 40
13. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Problem
Aim
Maximizing the utilization of all nodes for a Hadoop job by calculating the optimum
parameters i.e. number of mappers and reducers
Higher values mean higher parallelism but may cause resource contention
and coordination problems.
Optimum parameters depend on the resource consumption of the software.
17 / 40
14. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Previous Solutions
Kambatla, K., Pathak, A., and Pucha, H. (2009). Towards Optimizing Hadoop Provisioning in the
Cloud. In Proceedings of the 1st USENIX Workshop on Hot Topics in Cloud Computing (HotCloud),
118-122.
Calculates the optimum parameters (number of M/R) for the Hadoop job such
that each resource set is fully utilized.
Each application has a different bottleneck resource and utilization.
Requires to run a small chunk of the application to create a signature
Split job into n intervals of same duration.
Calculate the average consumption of all 3 resources (CPU, Disk, and Network)
in each interval.
Uses the configuration of the most similar signature in the database.
18 / 40
15. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Previous Solutions
19 / 40
16. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Suggested Solution
Design model of the software will be statically analyzed in order to guess
resource consumption pattern.
Critical (bottleneck) resources will be identified.
If required, source code or input data may also be included to the analysis.
20 / 40
17. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Output
An algorithm that calculates optimum Hadoop configuration for a given
software
An API that receives software model and outputs a Hadoop configuration
suggestion
21 / 40
18. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Problem
Aim
Optimally assigning interconnected virtual nodes to substrate network with
constraints
Constraints:
Datacenter capacities and bandwidth
Virtual topology requests and incompletely known cloud topology
Locality and jurisdiction
Application interaction and scalability rules
Objectives (Minimization):
Inter-DC communication
Geographical proximity to user and latency
23 / 40
19. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Previous Solutions
Papagianni, C., Leivadeas, A., Papavassiliou, S., Maglaris, V., Cervello-Pastor, C., and Monje, A.
(2013). On the Optimal Allocation of Virtual Resources in Cloud Computing Networks. IEEE
Transactions on Computers (TC), 62(6):1060-1071.
Mapping user requests for virtual resources onto shared substrate resources
Problem is solved in two coordinated phases: node and link mapping.
In node mapping, the objective is to minimize the cost of mapping and the
method is the random relaxation of MIP to LP.
In link mapping, the objective is to minimize the number of hops and the
method is shortest path or minimum cost flow algorithms.
Suggested algorithm is compared with greedy heuristics in terms of
acceptance ratio and number of hops.
24 / 40
20. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Previous Solutions
Larumbe, F. and Sanso, B. (2013). A Tabu Search Algorithm for the Location of Data Centers
and Software Components in Green Cloud Computing Networks. IEEE Transactions on Cloud
Computing (TCC), 1(1):22-35.
Which component of the software should be hosted at which datacenter?
Minimize delay, cost, energy consumption, and CO2 emission.
Greedy initial solution is improved by moving one component at each step.
A random subset of neighbours are analyzed for the best improvement.
Suggested tabu search heuristic is compared with MIP formulation and
greedy heuristic in terms of execution time and optimality.
Tradeoff analysis between the multiple objectives
25 / 40
21. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Previous Solutions
Agarwal, S., Dunagan, J., Jain, N., Saroiu, S., Wolman, A., and Bhogan, H. (2011). Volley:
Automated Data Placement for Geo-Distributed Cloud Services. In Proceedings of the 7th
USENIX Symposium on Networked Systems Design and Implementation (NSDI), 17-32.
Volley analyzes trace data and suggest migrations to improve DC capacity
skew, inter-DC traffic and user latency.
Considers client geographic diversity, client mobility and data dependency.
Method contains of 3 sequential phases:
1 Compute initial placement using weighted spherical mean,
2 Iteratively move data to reduce latency,
3 Iteratively collapse data to datacenters.
Compared against 3 simple heuristics: oneDC, commonIP and hash.
26 / 40
22. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Suggested Solution
VN requests and DC network are represented as undirected weighted graphs.
Nodes store computational capacities (i.e. CPU, memory) and storage.
Edge weights represent bandwidth capacities.
Graph similarity - subgraph matching algorithms will be employed.
Suggested solution will be static.
27 / 40
23. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Output
An algorithm that optimizes the resource provisioning
An API that receives graph and constraint inputs and outputs a matching
Aral, A., and Ovatman, T. (2014). Improving Resource Utilization in Cloud
Environments using Application Placement Heuristics. In Proceedings of
the 4th International Conference on Cloud Computing and Services Science
(CLOSER), 527-534.
28 / 40
24. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Problem
Aim
Analysis of the shuffle and scheduling algorithms of Apache Hadoop framework.
Initial selection of DataNodes, Mappers and Reducers is critical to reduce
network traffic.
Dynamic re-distribution of work during execution results in better load
balancing and better performance.
30 / 40
25. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Previous Solutions
FIFO scheduler: Integrated scheduling algorithm, manual prioritization
Fair scheduler: Each job receives an equal share of resources over time, on
average. Organizes jobs into user pools that are resourced fairly.
Capacity scheduler: Allows sharing a large cluster while giving each user a
minimum capacity guarantee. Priority queues are used instead of pools
where excess capacity is reused.
Hadoop On Demand: Provisions virtual clusters from within larger physical
clusters. Adapts and scales when the workload changes.
Learning scheduler: Chooses the task that is expected to provide max utility.
Adaptive scheduler: User specifies a deadline at job submission time, and
the scheduler adjusts the resources to meet that deadline.
31 / 40
26. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Suggested Solution
Cost based analysis of the existing schedulers
Which scheduler is more appropriate for given costs of map, reduce, shuffle
phases?
How can DataNode selection for blocks and clones be optimized to reduce
network traffic?
32 / 40
27. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
MapReduce Configuration
Resource Selection and Optimization
Work Distribution to Resources
Output
Formal modeling and analysis of Hadoop schedulers
An API that receives costs of MapReduce phases and outputs a scheduler
suggestion
A patch for open-source Apache Hadoop framework
33 / 40
28. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Time Plan
Today
2014 2015 2016
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Resource Selection
Research
Modeling
Development
Documentation
MapReduce Configuration
Research
Modeling
Development
Documentation
Work Distribution
Research
Modeling
Development
Documentation
35 / 40
29. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Summary
Optimization on 3 related phases of the cloud software life cycle
1 Map Reduce Configuration
2 Resource Selection and Assignment
3 Work Distribution and Load Balancing
Graph based modeling of the cloud environment and resource allocation
problem
Holistic approach that aims to increase the performance of cloud software
(Map Reduce) by optimizing resource allocation
37 / 40
30. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Unique Aspects
Inclusion of cloud computing aspects to the RA problem
Virtual topology requests and incompletely known network topology
Locality and Jurisdiction
Scalability rules
Inclusion of software characteristics (i.e. design model) to the RA problem
Optimization with different objectives at each level
38 / 40
31. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Impact
Our goal is to contribute to the long-held goal of utility computing.
Cloud providers will benefit since they will be able to use their resources more
efficiently and will be able to serve more customers without violating the
service level agreements.
Cloud users will also benefit in terms of shorter application execution time and
less infrastructure requirement (lower cost).
39 / 40
32. Introduction
Main Topics of the Thesis
Methods and Techniques
Time Plan
Conclusion
Thank you for your time.
40 / 40
34. Appendix II: Resource Allocation Problem in Cloud
Conceptual phase
1 Resource modeling: Notations only exist for concrete resources. Different
levels of granularity can be chosen. Vertical and horizontal heterogeneity cause
interoperability problems.
2 Resource offering and treatment: Independent from modeling. New
requirements are present in addition to common network and computation ones.
Operational phase
3 Resource discovery and monitoring: Finding suitable candidate resources
based on proximity or impact on network. Passive or active monitoring
strategies exist.
4 Resource selection and optimization: Finding a configuration that fulfills all
requirements and optimizes infrastructure usage. Dynamicity, algorithm
complexity, and large number of constraints are the main problems.
2 / 2