Weitere ähnliche Inhalte
Ähnlich wie BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14
Ähnlich wie BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14 (20)
Mehr von Romeo Kienzler (20)
Kürzlich hochgeladen (20)
BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14
- 1. © 2013 IBM Corporation1
BigData processing in the cloud – Guest Lecture -
University of Applied Sciences Rapperswil - 29.4.14
Romeo Kienzler
IBM Innovation Center
Source: http://res.sys-con.com/story/oct12/2398990/Cloud_BigData_468.jpg
- 4. © 2013 IBM Corporation4
What is BIG data?
Big Data
Hadoop
- 5. © 2013 IBM Corporation5
What is BIG data?
Business Intelligence
Data Warehouse
- 6. © 2013 IBM Corporation6
Map-Reduce → Hadoop → BigInsights
- 7. © 2013 IBM Corporation7
BigData UseCases
●
Google Index
●
40 X 10^9 = 40.000.000.000 => 40 billion pages indexed
●
Will break 100 PB barrier soon
●
Derived from MapReduce
●
now “caffeine” based on “percolator”
●
Incremental vs. batch
●
In-Memory vs. disk
- 8. © 2013 IBM Corporation8
BigData UseCases
●
CERN LHC
●
25 petabytes per year
●
Facebook
●
Hive Datawarehouse
●
300 PB, growing 600 TB / d
●
> 100 k servers
●
Genomics
●
Enterprises
●
Data center analytics (Logflies, OS/NW monitors, ...)
●
Predictive Maintenance, Cybersecurity
●
Social Media Analytics
●
DWH offload
●
Call Detail Record (CDR) data preservation
http://www.balthasar-glaettli.ch/vorratsdaten/
- 10. © 2013 IBM Corporation10
BigData Analytics – Predictive Analytics
"sometimes it's not
who has the best
algorithm that wins;
it's who has the most
data."
(C) Google Inc.
The Unreasonable Effectiveness of Data¹
¹http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf
No Sampling => Work with full dataset => No p-Value/z-Scores anymore
- 12. © 2013 IBM Corporation12
Aggregated Bandwith between CPU, Main
Memory and Hard Drive
1 TB (at 10 GByte/s)
- 1 Node - 100 sec
- 10 Nodes - 10 sec
- 100 Nodes - 1 sec
- 1000 Nodes - 100 msec
- 13. © 2013 IBM Corporation13
Fault Tolerance / Commodity Hardware
AMD Turion II Neo N40L (2x 1,5GHz / 2MB / 15W), 8 GB RAM,
3TB SEAGATE Barracuda 7200.14
< CHF 500
100 K => 200 X (2, 4, 3) => 400 Cores, 1,6 TB RAM, 200 TB HD
MTBF ~ 365 d > 1,5 d
Source: http://www.cloudcomputingpatterns.org/Watchdog
- 16. © 2013 IBM Corporation16
HDFS – Hadoop File System
- 35. © 2013 IBM Corporation35
Map-Reduce
Source: http://www.cloudcomputingpatterns.org/Map_Reduce
- 77. © 2013 IBM Corporation77
What role is the cloud playing here?
- 78. © 2013 IBM Corporation78
“Elastic” Scale-Out
Source: http://www.cloudcomputingpatterns.org/Continuously_Changing_Workload
- 79. © 2013 IBM Corporation79
“Elastic” Scale-Out
of
- 80. © 2013 IBM Corporation80
“Elastic” Scale-Out
of
CPU Cores
- 81. © 2013 IBM Corporation81
“Elastic” Scale-Out
of
CPU Cores Storage
- 82. © 2013 IBM Corporation82
“Elastic” Scale-Out
of
CPU Cores Storage
- 83. © 2013 IBM Corporation83
“Elastic” Scale-Out
of
CPU Cores Storage Memory
- 84. © 2013 IBM Corporation84
“Elastic” Scale-Out
of
CPU Cores Storage Memory
- 85. © 2013 IBM Corporation85
“Elastic” Scale-Out
linear
Source: http://www.cloudcomputingpatterns.org/Elastic_Platform
- 86. © 2013 IBM Corporation86
“Elastic” Scale-Out
linear
Source: http://www.cloudcomputingpatterns.org/Elastic_Platform
- 87. © 2013 IBM Corporation87
BigData Scale-Out
How do Databases Scale-Out?
- 88. © 2013 IBM Corporation88
BigData Scale-Out
How do Databases Scale-Out?
- 89. © 2013 IBM Corporation89
How do Databases Scale-Out?
Shared Disk Architectures
- 90. © 2013 IBM Corporation90
How do Databases Scale-Out?
Shared Disk Architectures
- 91. © 2013 IBM Corporation91
How do Databases Scale-Out?
Shared Nothing Architectures
- 92. © 2013 IBM Corporation92
Born on the cloud Databases
Source: http://www.constructioncloudcomputing.com/wp-content/uploads/2010/10/dreamstime_7360880-480x300.jpg
Source: http://www.cloudcomputingpatterns.org/Execution_Environment
- 93. © 2013 IBM Corporation93
Google AppEngine
Google App Engine is a Platform as a Service (PaaS) offering that lets
you build and run applications on Google’s infrastructure. App Engine
applications are easy to build, easy to maintain, and easy to scale as
your traffic and data storage needs change. With App Engine, there are
no servers for you to maintain. You simply upload your application and
it’s ready to go.
Source: http://www.cloudcomputingpatterns.org/Platform_as_a_Service_%28PaaS%29
- 94. © 2013 IBM Corporation94
Google AppEngine Database Services
- 96. © 2013 IBM Corporation96
IBM BlueMix
BlueMix is a Platform as a Service Cloud,
based on Cloud Foundry, employing Enterprise
grade services enriched with IBM Software and
hosted at SOFTLAYER
- 97. © 2013 IBM Corporation97
IBM BlueMix, a Cloudfoundry runtime
Linux VM
Linux VM
Code
Runtime
Framework+
Droplet
Linux VM
Container Container Container
SQL
Push
SSO
Services:
...
DropletDroplet
- 98. © 2013 IBM Corporation98
●
Summary
●
BigData is born on the cloud
●
Cloud facilitates resource provisioning, configuration and deployment
●
Highly innovative area
●
Technology
●
UseCases
●
Links
●
http://en.wikipedia.org/wiki/MapReduce
●
http://www.se-radio.net/2013/12/episode-199-michael-stonebraker/
●
Sign up for the free BlueMix beta
●
http://bluemix.net
●
Come to the BlueMix Days
●
http://bit.ly/1lsIY8J
●
Use our software
●
Biginsights:
http://www.ibm.com/software/data/infosphere/biginsights/quick-start/