Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Cloud DBMS for large scale data analysis
1.
2. The concept of
‘cloud computing’
is
currently receiving considerable attention,
both in the research and commercial arenas
Cloud
computing
is
the
delivery
of
computing as a service rather than a
product,
whereby
shared
resources,
software and information are provided to
computers and other devices as a utility
(like the electricity grid) over a network
(typically the Internet).
3. In this
paper we
discuss the
limitations
and
opportunities
of deploying
data
management
issues on
these
emerging
cloud
computing
platforms.
4. We present a list of features that
a DBMS designed for large scale
data analysis tasks running on an
Amazon-style
offering
should
contain.
We thus express the need for a
new DBMS, designed specifically
for cloud computing environments.
5. Data management applications are potential
candidates for deployment in the cloud.
Cloud computing vendors typically maintain little
more than the hardware, and give customers a
set of virtual machines in which to install their
own software.
Cloud-based DBMS are extremely scalable. They
are able to handle volumes of data and
processes that would exhaust a typical DBMS.
6. • We thus foreground a research objective for
large scale data analysis in the cloud,
showing why currently available systems are
not ideally suited for cloud deployment, and
arguing that there is a need for a newly
designed DBMS, architected specifically for
cloud computing platforms.
7. . Cloud computing is a subscription-based service
where you can obtain networked storage space
and computer resources.
. There are different types of clouds that you can
subscribe to depending on your needs. As a home
user or small business owner, you will most likely
use public cloud services.
Public Cloud - A public cloud can be accessed by any subscriber
with an internet connection and access to the cloud space.
Private Cloud - A private cloud is established for a specific group
or organization and limits access to just that group.
8. • Community Cloud - A
community cloud is shared
among two or more
organizations that have similar
cloud requirements.
• Hybrid Cloud - A hybrid cloud is
essentially a combination of at
least two clouds, where the
clouds included are a mixture of
public, private, or community.
9. Compute power is elastic, but
only if workload is parallelizable
Agility
Cost
Reliability
Data is stored at an untrusted
host.
Data is replicated, often across
large geographic distances
10.
11. Transactional data
Analytical data management
management
Shared-Nothing
Typically
architecture Shared-nothing architecture is a good
not
use
transactional
in match
for
analytical
data
data management.
management.
ACID
Property
is
Hard
to ACID Property is not needed
maintain in transactional data
management.
Transactional
database
generally small system.
are Analytical data management systems
are
generally
larger
than
transactional systems.
There are enormous risks in Particularly sensitive data can often
storing transactional data on an be
left
out
of
the
analysis
data
12. In the contemporary scenario there is implicit
need for construction of a new database
distinctively for clouds understanding its
applications, need and compatibility…
Architecture which can detect and prevent the
various threats, attacks and other security related
issues which continuously depletes the efficiency
and the productivity of the cloud that can be in the
future a platform for cloud computing.
The next step is to propose a model for grid
computing also.
13.
14. •J. Hurwitz, M. Kaufman, and R. Bloor, “Cloud Computing for Dummies,”
Wiley Publishing, Inc. 2010.
•Leah Muthoni Riungu, Ossi Taipale, Kari Smolander, “Software Testing as an
Online Service: Observations from Practice,” In Third International Conference
on Software Testing, Verification, and Validation Workshops (ICSTW), 418-423,
2010.
•M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a
Database on S3. In Proc. of SIGMOD, pages 251–264, 2008.
•] B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.
Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!s hosted data serving
platform. In Proceedings of VLDB, 2008.
•J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large
clusters. pages 137–150, December 2004.
•Y. Yang, C. Onita, J. Dhaliwal, X. Zhang, “TESTQUAL: conceptualizing
software testing as a service,” In the 15th Americas conf. on information
systems, 6-9.08, San Francisco, California, USA, paper 608, 2009.