Weitere ähnliche Inhalte
Was ist angesagt?
Apache Kafka in the Telco Industry (OSS, BSS, OTT, IMS, NFV, Middleware, Main...Apache Kafka in the Telco Industry (OSS, BSS, OTT, IMS, NFV, Middleware, Main...Kai Wähner
Ähnlich wie Cw13 big data and apache hadoop by amr awadallah-cloudera
Ähnlich wie Cw13 big data and apache hadoop by amr awadallah-cloudera (20)
Mehr von inevitablecloud (10)
Kürzlich hochgeladen (20)
Cw13 big data and apache hadoop by amr awadallah-cloudera
- 1. Intro to Big Data and Apache Hadoop
Dr. Amr Awadallah, CTO/Founder
@awadallah, aaa@cloudera.com
- 2. Who is Cloudera?
2
What the Enterprise
Requires
The market-leading
Hadoop-based platform
with batch and real-time
processing frameworks
A comprehensive suite of
system and data
management software
Training and certification
programs
Comprehensive support
and consulting services
Extensive Partner
Ecosystem
Over 400 partners across
hardware, software and
services
The Leader in
Big Data
Management
Deliver a revolutionary
data management
platform based on
Apache Hadoop
Enable organizations to
improve operational
efficiency and Ask
Bigger Questions of all
their data
Customers & Users
Across Industries
More production
deployments than all
other vendors combined
©2013 Cloudera, Inc. All Rights Reserved.
- 3. Data Has Changed in the Last 30 YearsDATAGROWTH
END-USER
APPLICATIONS
THE INTERNET
MOBILE DEVICES
SOPHISTICATED
MACHINES
STRUCTURED DATA – 10%
1980 2012
UNSTRUCTURED DATA – 90%
3 ©2013 Cloudera, Inc. All Rights Reserved.
- 4. What if you wanted to…
4
Data
Question
Speed
Usage
Type/Form
©2013 Cloudera, Inc. All Rights Reserved.
- 5. So what is Apache ?
Self-Healing
High-Bandwidth
Clustered Storage
Byte Streams
Fault-Tolerant
Distributed Processing
Schema-on-Read
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Input File
HDFS storage distribution
Node A Node B Node C Node D Node E
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Output File
MapReduce compute distribution
Node A Node B Node C Node D Node E
Storage
Compute
©2013 Cloudera, Inc. All Rights Reserved.5
- 7. The Key Benefit: Agility/Flexibility
7
Schema-on-Read (Hadoop):Schema-on-Write (RDBMS):
• Prescriptive Data Modeling:
• Create static DB schema
• Transform data into RDBMS
• Query data in RDBMS format
• New columns must be added
explicitly before new data can
propagate into the system.
• Good for Known Unknowns
(Repetition)
• Descriptive Data Modeling:
• Copy data in its native format
• Create schema + parser
• Query Data in its native format
(does ETL on the fly)
• New data can start flowing any time
and will appear retroactively once the
schema/parser properly describes it.
• Good for Unknown Unknowns
(Exploration)
©2013 Cloudera, Inc. All Rights Reserved.
- 8. Scalable Technology + Scalable Development
8
Grows without requiring developers to
re-architect their algorithms/application
©2013 Cloudera, Inc. All Rights Reserved.
AUTO SCALE
- 9. Low ROB
(but still a ton of
aggregate value)
High ROB
Economics: Return on Byte
9 ©2013 Cloudera, Inc. All Rights Reserved.
- 10. Cloud Deployment
CDH: Cloudera Distribution incl. Apache Hadoop
Coordination
Data
Integration
Fast
Read/Write
Access
Batch Processing Languages
Web Console
Job Workflow
Metadata
APACHE ZOOKEEPER
APACHE FLUME,
APACHE SQOOP APACHE HBASE
APACHE PIG, APACHE HIVE
HUE
APACHE OOZIE
APACHE HIVE MetaStore
Interactive SQL
Data Mining Lib
Impala
APACHE MAHOUT
APACHE WHIRR
Build/Test:APACHEBIGTOP
Cloudera Manager Free Edition (Installation Wizard)
©2013 Cloudera, Inc. All Rights Reserved.10
Hadoop Core Kernel
MapReduce, HDFS
Connectivity
Data Processing Lib
DataFu for Pig
ODBC/JDBC/FUSE/HTTPS
- 12. The Cloudera Solution Stack
12
CLOUDERA
UNIVERSITY
DEVELOPER
TRAINING
ADMINISTRATOR
TRAINING
DATA SCIENCE
TRAINING
CERTIFICATION
PROGRAMS
PROFESSIONAL SERVICES
USE CASE DISCOVERY NEW HADOOP DEPLOYMENT PROOF-OF-CONCEPT
DEPLOYMENT CERTIFICATIONPROCESS & TEAM
DEVELOPMENT
PRODUCTION PILOTS
MANAGEMENT
SOFTWARE &
TECHNICAL SUPPORT
(SUBSCRIPTION)
CDH
INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CM
CLOUDERA MANAGER
CS
CLOUDERA SUPPORT
OSS
APACHE HADOOP & OPEN SOURCE SOFTWARE
©2013 Cloudera, Inc. All Rights Reserved.
- 13. Powered by Cloudera Impala
13
BEFORE IMPALA
• With Impala:
Interactive ANSI-92 SQL queries
Native distributed query engine
Optimized for low-latency
• Provides:
Answers as fast as you can ask
Everyone can ask questions of all data
Big data storage and analytics together
WITH IMPALA
• Unified storage:
Supports HDFS and HBase
Flexible file formats and schemas
• Unified Metastore
• Unified Security
• Unified Client Interfaces:
ODBC/JDBC
SQL syntax
Hue Beeswax Web UI
BATCH PROCESSING
USER INTERFACE
REAL-TIME ACCESS
©2013 Cloudera, Inc. All Rights Reserved.
- 14. Cloudera in the Enterprise Stack
14 ©2013 Cloudera, Inc. All Rights Reserved.
- 15. Use Case: A Major Financial Institution
©2013 Cloudera, Inc. All Rights Reserved.15
The Challenge:
• Current EDW at capacity; cannot support growing data depth and width
• Performance issues in business critical apps; little room for innovation.
New solution saves tens of millions by
optimizing existing EDW for analytics
& reducing data storage costs by 99%
The Solution:
• Cloudera Enterprise offloads data
storage (S), processing (T) & some
analytics (Q) from the EDW.
• EDW resources can now be focused
on repeatable operational analytics.
• Month data scan in 4 secs vs. 4 hours
Operational
(44%)
ELT Processing
(42%)
Analytics (11%)
DATA WAREHOUSE
Analytics
Processing
Storage
CLOUDERA
Operational
(50%)
Analytics
(50%)
DATA WAREHOUSE
- 16. Beyond Data Warehousing
16
COMMUNICATIONS
Location-
based
advertising
HEALTH CARE
Patient sensors,
monitoring,
EHRs Quality
of care
LAW ENFORCEMENT
& DEFENSE
Threat analysis,
Social media
monitoring,
Photo analysis
EDUCATION
& RESEARCH
Experiment
sensor
analysis
FINANCIAL SERVICES
Risk & portfolio
analysis
New products
ON-LINE SERVICES /
SOCIAL MEDIA
People & career
matching
Website
optimization
UTILITIES
Smart Meter
analysis for
network
capacity
CONSUMER
PACKAGED GOODS
Sentiment
analysis
of what’s hot,
customer service
MEDIA /
ENTERTAINMENT
Viewers /
advertising
effectiveness
TRAVEL &
TRANSPORTATION
Sensor analysis for optimal
traffic flows
Customer
sentiment
LIFE SCIENCES
Clinical trials
Genomics
RETAIL
Consumer sentiment
Optimized
marketing
AUTOMOTIVE
Auto sensors
reporting location,
problems
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty
analysis
OIL & GAS
Drilling
exploration
sensor
analysis
©2013 Cloudera, Inc. All Rights Reserved.
- 18. Flexibility
• Store any data
• Run any analysis
• Keep’s pace with the rate of change of incoming data
Scalability
• Proven growth to PBS/1,000s of nodes
• No need to rewrite queries, automatically scales
• Keep’s pace with the rate of growth of incoming data
Economics
• Cost per TB at a fraction of other options
• Keep all of your data alive in an active archive
• Powering the data beats algorithm movement
The Cloudera Platform for Big Data
18 ©2013 Cloudera, Inc. All Rights Reserved.