4. 4
Data!
The New York Stock Exchange generate about
1TB of new trade data per day.
A commercial aircraft generates 3GB of flight
sensor data in 1 hour.
Vodafone generates 3TB of Call Detail Record
(CRDs) per day.
Between 2009 and 2014, the total number of U.S.
online banking households will increase from 54
million to 66 million.
7. 7
Sample Data v.s. Big Data
Can you judge a persons life expectancy?
Given:
– DNA
– Medical records
– Food
– Lifestyle (smoking, drinking, driving, exercise)
12. 12
A scalable fault-tolerant distributed system
for data storage and processing
Completely written in java
Open source & distributed under Apache license
What is Hadoop?
15. 15
Big Data Future Architecture
Sscial Media Images e-mails Crawlers
ERP CRM LOB APPs
Unstructured and Structured Data
Data Warehouse / NewSQL
Hadoop On
Cloud
Hadoop On
Private
Server
Connectors
S
S
R
S
BI Platform
Familiar End User Tools
Spreadsheet Predictive Analytics
Data Market Place
NoSQL
Petabytes of Data
(Unstructured)
Hundreds of TB of Data
(structured)
16. 16
Issue with Big Data Infrastructure
Large investment
Scalabilty
ROI
Business Cases
17. 17
Big Data on Cloud
Using IaaS to leverage Cloud Vms
Using Big Data as a Services
27. 27
Hadoop as a Service
Amazon Elastic Map Reduce
Rackspace Cloud Big Data Platform
Qubole
Google Cloud Platform
IBM Bluemix: Analytic on Hadoop
Microsoft Azure HDInsight
40. 40
Big Data on Cloud Roadmap
Step 1: Build the business case
Step 2: Assess your Big Data application
workloads
Step 3: Develop a technical approach for
deploying and managing Big Data in the cloud
Step 4: Address governance, security, privacy,
risk,
Step 5: Deploy, integrate, and operationalize
your cloud-based Big Data infrastructure
Source : Deploying Big Data Analytics Applications to the Cloud: Roadmap for Success: CSCS
41. 41
Sample applications
Enterprise applications already hosted in the
cloud
High-volume external data sources that
require considerable preprocessing
Tactical applications beyond your on-
premises, Big Data capabilities
Elastic provisioning of very large but short-
lived analytic sandboxes
Source : Deploying Big Data Analytics Applications to the Cloud: Roadmap for Success: CSCS