4. Cloud computing: broader than any one app
Cloud computing is a method to address
scalability and availability concerns
for enterprise applications.
5. The take-away
Cloud computing represents a new approach to scalability
problems.
Reusable infrastructure components are available to your
organization to build rapidly and scale gracefully.
6. Outline
Introduction
More data than you’ve ever seen before
Processing large data volumes
Hosting large-scale applications
An evolving ecosystem of components
7. Data volumes are growing
Amount of data one computer can store: 10,000 GB
Amount of data one computer can process at a time: 32 GB
Amount of data processed by Google per month:
400,000,000 GB
… in 2007
8. Where does data come from?
Watching your users
(clicks on web site, pages viewed, items purchased…)
Simulations, scientific/experimental data
(genome sequences, medical imaging, wireless sensor grids…)
User-provided content
(Billions of flickr images, youtube videos, blog posts…)
Your infrastructure itself
(10,000 computers reporting their status every second…)
Existing databases
(product catalogs, historical sales data, surveys…)
9. Large-scale data processing lessons
You can generate vastly more data than you can process with
conventional tools
No relational database handles petabytes gracefully
Data processing must involve many machines working in parallel
10. Hadoop: an active storage platform
A community-driven, commercially-supported, extensible
system.
Based on techniques developed by Google.
Separates the problem of extracting information from large
data from performing reliable computation.
Combines a scalable, reliable compute framework with self-
healing high-bandwidth storage.
11. Putting it together: active storage
Data automatically distributed to nodes at load time
Load balancing implicitly managed by Hadoop
12. Automatic parallel processing
Data elements processed locally, in parallel
Reliable computation implicitly managed by Hadoop
13. Distributed data, single volume
Output data is written to local disks, and forms a single user-
accessible volume
A high-level abstraction for engineers and analysts
14. A self-healing system
Loss of nodes causes automatic data rebalance
Automatic recovery managed by Hadoop
17. Hosting infrastructure
Managed cloud platforms provide hardware resources for rent.
Think cycles and bytes, not months and machines.
Provides on-demand low-level infrastructure for hosting
applications.
20. Conclusions
Cloud computing makes resources available in an on-demand
fashion.
From raw hardware up to fully-configured applications
The range of resources available is increasing, with new tools
being aimed at different levels of the hardware/software stack.
These tools allow you to rapidly integrate disparate components
of your infrastructure and handle vastly more data than before.
21. (c) 2008 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0
Iceberg by wikipedia user Calyponte
22. (c) 2008 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0