Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Geospatial data
1. Automatic Scaling
Hadoop in the Cloud for
Efficient Process of Big
Geospatial Data
Mostafa Abbass
American University of Culture and
Education (AUCE)
Nabatieh Campus
Cloud Computing & Security
CSI549
Dr. Abbass ZEIN EDDINE
2. Outline
1. Challenges
2. Hadoop for Geospatial Data Processing
3. Auto-Scaling Framework
4. CoveringHDFS
5. Auto-Scaling Algorithm
6. Experimental Result and Discussion
7. Conclusion
8. Critique
3. 1. Challenges
While traditional computing infrastructure does not
scale well with the rapidly increasing data volume,
Hadoop has attracted increasing attention in
geoscience communities for handling big geospatial
data.
The massive data volume and the intrinsic
complexity and high dimensions of the geospatial
datasets cause challenges to the Efficient
processing of big geospatial data that is crucial for
tackling global and regional. To accelerate
geospatial data processing, distributed computing
infrastructures are widely used.
1
4. 2. Hadoop for Geospatial Data Processing
Hadoop as an open source application for the MapReduce framework.
Adaptation of Hadoop to create geospatial glossaries from large volunteer
geospatial data.
Hadoop has been leveraged to store and process bulky
remote sensing images to support large concurrent user
requests.
Hadoop MapReduce utilized
to enable penalization of big
climate data processing.
HadoopGIS offers a scalable and high performance
spatial query system over MapReduce to accelerate
geospatial data analysis .
2
5. 3. Auto-Scaling Framework 3
The goal of the framework is to dynamically adjust computing resources based on the processing workload to
handle the spike requirement for computing power while minimizing resource consumption.
6. 4. CoveringHDFS 4
The CoveringHDFS is a mechanism to scale down the cluster safely and timely without losing
data, instead of modifying the underlying Hadoop software.
7. 5. Auto-Scaling Algorithm 5
a. Scaling up
N =
N pending− N finished
𝑛
b. Scaling down
Removing the compute-slaves is straightforward. When the idle time for a compute-slave
exceeds a user-specified threshold, this slave is terminated.
8. 6. Auto-Scaling Prototype and Experimental Result
The proposed auto-scaling framework is able to work with the cloud
platforms that allow users to provision VMs using API (through IaaS).
six physical machines
1 Gigabit Ethernet (Gbps)
8-core CPU running at 2.35 GHz
16 GB of RAM
6
a. Prototype Implementation
9. b. Experimental Design
Cluster Type Master Slaves HDFS
Auto-scaling
cluster
One
medium
instance
Dynamic, start with three core-
slaves with medium instances, can
scale up 12 compute-slaves with
small instances
CoveringHDFS,
starting with 3
core-slaves
Seven-slave
cluster
One
medium
instance
Static, 7 slaves with three medium
instances and four small instances
raditional HDFS
with 7 slaves
Fourteen-slave
cluster
One
medium
instance
Static, 14 slaves with three
medium instances and 11 small
instances
Traditional HDFS
with 14 slaves
Hadoop Cluster Setup
7
11. 7. Conclusion
Such a cloud-enabled, auto-scalable
computing cluster offers a powerful
tool to process big geoscience data
with optimized performance and
reduced resource consumption.
9
While DEM interpolation is used as an
example, the proposed framework can
be extended to handle other
geoprocessing applications that run on
Hadoop, such as the climate data
analytical services powered by
Hadoop and cloud computing.
12. The importance of this research lies, in helping to face global, and regional
challenges such as climate change and natural disasters through effective
processing of large geospatial data, as the results show that this automatic
scaling framework is able to significantly reduce the use of computing
resources by 80% and ensure that the processing is completed within a
period of time.
I recommend reading this paper because it is very useful in the field of cloud
computing and has good results.
8. Critique
THANK YOU
10