SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Intro to Big Data and Apache Hadoop
Dr. Amr Awadallah, CTO/Founder
@awadallah, aaa@cloudera.com
Who is Cloudera?
2
What the Enterprise
Requires
 The market-leading
Hadoop-based platform
with batch and real-time
processing frameworks
 A comprehensive suite of
system and data
management software
 Training and certification
programs
 Comprehensive support
and consulting services
Extensive Partner
Ecosystem
 Over 400 partners across
hardware, software and
services
The Leader in
Big Data
Management
 Deliver a revolutionary
data management
platform based on
Apache Hadoop
 Enable organizations to
improve operational
efficiency and Ask
Bigger Questions of all
their data
Customers & Users
Across Industries
 More production
deployments than all
other vendors combined
©2013 Cloudera, Inc. All Rights Reserved.
Data Has Changed in the Last 30 YearsDATAGROWTH
END-USER
APPLICATIONS
THE INTERNET
MOBILE DEVICES
SOPHISTICATED
MACHINES
STRUCTURED DATA – 10%
1980 2012
UNSTRUCTURED DATA – 90%
3 ©2013 Cloudera, Inc. All Rights Reserved.
What if you wanted to…
4
Data
Question
Speed
Usage
Type/Form
©2013 Cloudera, Inc. All Rights Reserved.
So what is Apache ?
Self-Healing
High-Bandwidth
Clustered Storage
Byte Streams
Fault-Tolerant
Distributed Processing
Schema-on-Read
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Input File
HDFS storage distribution
Node A Node B Node C Node D Node E
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Output File
MapReduce compute distribution
Node A Node B Node C Node D Node E
Storage
Compute
©2013 Cloudera, Inc. All Rights Reserved.5
6
Next-Gen Data Management
©2013 Cloudera, Inc. All Rights Reserved.
The Key Benefit: Agility/Flexibility
7
Schema-on-Read (Hadoop):Schema-on-Write (RDBMS):
• Prescriptive Data Modeling:
• Create static DB schema
• Transform data into RDBMS
• Query data in RDBMS format
• New columns must be added
explicitly before new data can
propagate into the system.
• Good for Known Unknowns
(Repetition)
• Descriptive Data Modeling:
• Copy data in its native format
• Create schema + parser
• Query Data in its native format
(does ETL on the fly)
• New data can start flowing any time
and will appear retroactively once the
schema/parser properly describes it.
• Good for Unknown Unknowns
(Exploration)
©2013 Cloudera, Inc. All Rights Reserved.
Scalable Technology + Scalable Development
8
Grows without requiring developers to
re-architect their algorithms/application
©2013 Cloudera, Inc. All Rights Reserved.
AUTO SCALE
Low ROB
(but still a ton of
aggregate value)
High ROB
Economics: Return on Byte
9 ©2013 Cloudera, Inc. All Rights Reserved.
Cloud Deployment
CDH: Cloudera Distribution incl. Apache Hadoop
Coordination
Data
Integration
Fast
Read/Write
Access
Batch Processing Languages
Web Console
Job Workflow
Metadata
APACHE ZOOKEEPER
APACHE FLUME,
APACHE SQOOP APACHE HBASE
APACHE PIG, APACHE HIVE
HUE
APACHE OOZIE
APACHE HIVE MetaStore
Interactive SQL
Data Mining Lib
Impala
APACHE MAHOUT
APACHE WHIRR
Build/Test:APACHEBIGTOP
Cloudera Manager Free Edition (Installation Wizard)
©2013 Cloudera, Inc. All Rights Reserved.10
Hadoop Core Kernel
MapReduce, HDFS
Connectivity
Data Processing Lib
DataFu for Pig
ODBC/JDBC/FUSE/HTTPS
Cloudera Enterprise
11 ©2013 Cloudera, Inc. All Rights Reserved.
The Cloudera Solution Stack
12
CLOUDERA
UNIVERSITY
DEVELOPER
TRAINING
ADMINISTRATOR
TRAINING
DATA SCIENCE
TRAINING
CERTIFICATION
PROGRAMS
PROFESSIONAL SERVICES
USE CASE DISCOVERY NEW HADOOP DEPLOYMENT PROOF-OF-CONCEPT
DEPLOYMENT CERTIFICATIONPROCESS & TEAM
DEVELOPMENT
PRODUCTION PILOTS
MANAGEMENT
SOFTWARE &
TECHNICAL SUPPORT
(SUBSCRIPTION)
CDH
INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CM
CLOUDERA MANAGER
CS
CLOUDERA SUPPORT
OSS
APACHE HADOOP & OPEN SOURCE SOFTWARE
©2013 Cloudera, Inc. All Rights Reserved.
Powered by Cloudera Impala
13
BEFORE IMPALA
• With Impala:
Interactive ANSI-92 SQL queries
Native distributed query engine
Optimized for low-latency
• Provides:
Answers as fast as you can ask
Everyone can ask questions of all data
Big data storage and analytics together
WITH IMPALA
• Unified storage:
Supports HDFS and HBase
Flexible file formats and schemas
• Unified Metastore
• Unified Security
• Unified Client Interfaces:
ODBC/JDBC
SQL syntax
Hue Beeswax Web UI
BATCH PROCESSING
USER INTERFACE
REAL-TIME ACCESS
©2013 Cloudera, Inc. All Rights Reserved.
Cloudera in the Enterprise Stack
14 ©2013 Cloudera, Inc. All Rights Reserved.
Use Case: A Major Financial Institution
©2013 Cloudera, Inc. All Rights Reserved.15
The Challenge:
• Current EDW at capacity; cannot support growing data depth and width
• Performance issues in business critical apps; little room for innovation.
New solution saves tens of millions by
optimizing existing EDW for analytics
& reducing data storage costs by 99%
The Solution:
• Cloudera Enterprise offloads data
storage (S), processing (T) & some
analytics (Q) from the EDW.
• EDW resources can now be focused
on repeatable operational analytics.
• Month data scan in 4 secs vs. 4 hours
Operational
(44%)
ELT Processing
(42%)
Analytics (11%)
DATA WAREHOUSE
Analytics
Processing
Storage
CLOUDERA
Operational
(50%)
Analytics
(50%)
DATA WAREHOUSE
Beyond Data Warehousing
16
COMMUNICATIONS
Location-
based
advertising
HEALTH CARE
Patient sensors,
monitoring,
EHRs Quality
of care
LAW ENFORCEMENT
& DEFENSE
Threat analysis,
Social media
monitoring,
Photo analysis
EDUCATION
& RESEARCH
Experiment
sensor
analysis
FINANCIAL SERVICES
Risk & portfolio
analysis
New products
ON-LINE SERVICES /
SOCIAL MEDIA
People & career
matching
Website
optimization
UTILITIES
Smart Meter
analysis for
network
capacity
CONSUMER
PACKAGED GOODS
Sentiment
analysis
of what’s hot,
customer service
MEDIA /
ENTERTAINMENT
Viewers /
advertising
effectiveness
TRAVEL &
TRANSPORTATION
Sensor analysis for optimal
traffic flows
Customer
sentiment
LIFE SCIENCES
Clinical trials
Genomics
RETAIL
Consumer sentiment
Optimized
marketing
AUTOMOTIVE
Auto sensors
reporting location,
problems
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty
analysis
OIL & GAS
Drilling
exploration
sensor
analysis
©2013 Cloudera, Inc. All Rights Reserved.
17
The Road Ahead
Bringing
Compute
to Data
Bringing
Applications
to Data
2006-2012 2013-???
Flexibility
• Store any data
• Run any analysis
• Keep’s pace with the rate of change of incoming data
Scalability
• Proven growth to PBS/1,000s of nodes
• No need to rewrite queries, automatically scales
• Keep’s pace with the rate of growth of incoming data
Economics
• Cost per TB at a fraction of other options
• Keep all of your data alive in an active archive
• Powering the data beats algorithm movement
The Cloudera Platform for Big Data
18 ©2013 Cloudera, Inc. All Rights Reserved.
Dr. Amr Awadallah
CTO/Founder
@awadallah
aaa@cloudera.com

Weitere ähnliche Inhalte

Was ist angesagt?

Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...Principled Technologies
 
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected WorldCloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected Worldandreas kuncoro
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceCloudera, Inc.
 
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAPAlan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAPMIT Enterprise Forum Cambridge
 
Peak 10 Cloud Delivered Desktop
Peak 10 Cloud Delivered DesktopPeak 10 Cloud Delivered Desktop
Peak 10 Cloud Delivered DesktopPeak 10
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDevOps.com
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWDataWorks Summit
 
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of ChoiceTOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of ChoicePeak 10
 
Why and-how-to-choose-an-iot-platforms-201701
Why and-how-to-choose-an-iot-platforms-201701Why and-how-to-choose-an-iot-platforms-201701
Why and-how-to-choose-an-iot-platforms-201701Omar Nawaz
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleJunSeok Seo
 
Big Data Analytics in Healthcare
Big Data Analytics in HealthcareBig Data Analytics in Healthcare
Big Data Analytics in HealthcareAltoros
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Cloudera, Inc.
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyCloudera, Inc.
 
CL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCisco
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learningSplunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learningDigital Transformation EXPO Event Series
 
Top 10 Reasons for Colocation
Top 10 Reasons for ColocationTop 10 Reasons for Colocation
Top 10 Reasons for ColocationPeak 10
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseThe Hive
 

Was ist angesagt? (20)

Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
Design advantages of Hadoop ETL offload with the Intel processor-powered Dell...
 
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected WorldCloudera - Enabling the IoT Revolution Driving Insights in a Connected World
Cloudera - Enabling the IoT Revolution Driving Insights in a Connected World
 
IoT-Enabled Predictive Maintenance
IoT-Enabled Predictive MaintenanceIoT-Enabled Predictive Maintenance
IoT-Enabled Predictive Maintenance
 
IoT Data as Service with Hadoop
IoT Data as Service with HadoopIoT Data as Service with Hadoop
IoT Data as Service with Hadoop
 
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAPAlan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
Alan Southall, SVP of Engineering, Head of IoT Predictive Maintenance, SAP
 
Peak 10 Cloud Delivered Desktop
Peak 10 Cloud Delivered DesktopPeak 10 Cloud Delivered Desktop
Peak 10 Cloud Delivered Desktop
 
Doing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOpsDoing DevOps for Big Data? What You Need to Know About AIOps
Doing DevOps for Big Data? What You Need to Know About AIOps
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of ChoiceTOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
TOP 10 Reasons to Make Peak 10 Your Cloud Provider of Choice
 
Why and-how-to-choose-an-iot-platforms-201701
Why and-how-to-choose-an-iot-platforms-201701Why and-how-to-choose-an-iot-platforms-201701
Why and-how-to-choose-an-iot-platforms-201701
 
IoTMeetup
IoTMeetupIoTMeetup
IoTMeetup
 
Device to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in OracleDevice to Intelligence, IOT and Big Data in Oracle
Device to Intelligence, IOT and Big Data in Oracle
 
Big Data Analytics in Healthcare
Big Data Analytics in HealthcareBig Data Analytics in Healthcare
Big Data Analytics in Healthcare
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 
CL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and PlanningCL2015 - Datacenter and Cloud Strategy and Planning
CL2015 - Datacenter and Cloud Strategy and Planning
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learningSplunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learning
 
Top 10 Reasons for Colocation
Top 10 Reasons for ColocationTop 10 Reasons for Colocation
Top 10 Reasons for Colocation
 
AI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the EnterpriseAI in Software for Augmenting Intelligence Across the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
 

Andere mochten auch

What is big data
What is big dataWhat is big data
What is big dataCnu Federer
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceDr Ganesh Iyer
 
What is hadoop and how it works?
What is hadoop and how it works?What is hadoop and how it works?
What is hadoop and how it works?Cnu Federer
 
Putting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The EnterprisePutting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The EnterpriseDataWorks Summit
 
An introduction to Apache Cassandra
An introduction to Apache CassandraAn introduction to Apache Cassandra
An introduction to Apache CassandraMike Frampton
 
Intro to big data and hadoop ubc cs lecture series - g fawkes
Intro to big data and hadoop   ubc cs lecture series - g fawkesIntro to big data and hadoop   ubc cs lecture series - g fawkes
Intro to big data and hadoop ubc cs lecture series - g fawkesgfawkesnew2
 
Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2IMC Institute
 
A (very) short intro to Hadoop
A (very) short intro to HadoopA (very) short intro to Hadoop
A (very) short intro to HadoopKen Krugler
 
Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)UzmaRuhy
 
Machine learning pour tous
Machine learning pour tousMachine learning pour tous
Machine learning pour tousDamien Seguy
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Data science a machine learning tour (french)
Data science a machine learning tour (french)Data science a machine learning tour (french)
Data science a machine learning tour (french)Franck Bardol
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentationlpaviglianiti
 

Andere mochten auch (20)

Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
What is big data
What is big dataWhat is big data
What is big data
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop - How It Works
Hadoop - How It WorksHadoop - How It Works
Hadoop - How It Works
 
What is hadoop and how it works?
What is hadoop and how it works?What is hadoop and how it works?
What is hadoop and how it works?
 
Putting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The EnterprisePutting Hadoop To Work In The Enterprise
Putting Hadoop To Work In The Enterprise
 
An introduction to Apache Cassandra
An introduction to Apache CassandraAn introduction to Apache Cassandra
An introduction to Apache Cassandra
 
Intro to big data and hadoop ubc cs lecture series - g fawkes
Intro to big data and hadoop   ubc cs lecture series - g fawkesIntro to big data and hadoop   ubc cs lecture series - g fawkes
Intro to big data and hadoop ubc cs lecture series - g fawkes
 
Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2Hadoop Hand-on Lab: Installing Hadoop 2
Hadoop Hand-on Lab: Installing Hadoop 2
 
Nuclear Weapons
Nuclear WeaponsNuclear Weapons
Nuclear Weapons
 
Hyperloop
HyperloopHyperloop
Hyperloop
 
A (very) short intro to Hadoop
A (very) short intro to HadoopA (very) short intro to Hadoop
A (very) short intro to Hadoop
 
Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)Li-Fi Technology (Perfect slides)
Li-Fi Technology (Perfect slides)
 
Machine learning pour tous
Machine learning pour tousMachine learning pour tous
Machine learning pour tous
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data science a machine learning tour (french)
Data science a machine learning tour (french)Data science a machine learning tour (french)
Data science a machine learning tour (french)
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 

Ähnlich wie Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Datajdijcks
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Jeffrey T. Pollock
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack EuropeHortonworks
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Jeffrey T. Pollock
 

Ähnlich wie Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community (20)

Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 

Mehr von TheInevitableCloud

Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...TheInevitableCloud
 
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...TheInevitableCloud
 
Cw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el moftyCw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el moftyTheInevitableCloud
 
Cw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend microCw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend microTheInevitableCloud
 
Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11TheInevitableCloud
 
Cw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawyCw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawyTheInevitableCloud
 
Cw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradiCw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradiTheInevitableCloud
 
Cw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed arabyCw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed arabyTheInevitableCloud
 
Cw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egyptCw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egyptTheInevitableCloud
 
Cw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tellCw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tellTheInevitableCloud
 
Cw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el moftyCw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el moftyTheInevitableCloud
 
Cw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatCw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatTheInevitableCloud
 
Cw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9nersCw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9nersTheInevitableCloud
 

Mehr von TheInevitableCloud (13)

Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
 
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
Cw13 the rising stack-how & why open stack is changing it by mark collier-ope...
 
Cw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el moftyCw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el mofty
 
Cw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend microCw13 securing your journey to the cloud by rami naccache-trend micro
Cw13 securing your journey to the cloud by rami naccache-trend micro
 
Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11Cw13 insights into the cloud market by abdelrahman wahid-cloud11
Cw13 insights into the cloud market by abdelrahman wahid-cloud11
 
Cw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawyCw13 why cloud computing has to go the foss way by ahmed mekkawy
Cw13 why cloud computing has to go the foss way by ahmed mekkawy
 
Cw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradiCw13 playing with scala by tamer abdelradi
Cw13 playing with scala by tamer abdelradi
 
Cw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed arabyCw13 fedora cloud by ahmed araby
Cw13 fedora cloud by ahmed araby
 
Cw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egyptCw13 egypt twards open source by haitham nabil-open egypt
Cw13 egypt twards open source by haitham nabil-open egypt
 
Cw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tellCw13 dell cloud computing for telco sp by anis tell
Cw13 dell cloud computing for telco sp by anis tell
 
Cw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el moftyCw13 culture of innovation by mohamed el mofty
Cw13 culture of innovation by mohamed el mofty
 
Cw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hatCw13 build open hybrid cloud by diaa radwan-red hat
Cw13 build open hybrid cloud by diaa radwan-red hat
 
Cw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9nersCw13 aws by tamer abdul radi-cloud9ners
Cw13 aws by tamer abdul radi-cloud9ners
 

Kürzlich hochgeladen

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 

Kürzlich hochgeladen (20)

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 

Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13 from the Inevitable Cloud Community

  • 1. Intro to Big Data and Apache Hadoop Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com
  • 2. Who is Cloudera? 2 What the Enterprise Requires  The market-leading Hadoop-based platform with batch and real-time processing frameworks  A comprehensive suite of system and data management software  Training and certification programs  Comprehensive support and consulting services Extensive Partner Ecosystem  Over 400 partners across hardware, software and services The Leader in Big Data Management  Deliver a revolutionary data management platform based on Apache Hadoop  Enable organizations to improve operational efficiency and Ask Bigger Questions of all their data Customers & Users Across Industries  More production deployments than all other vendors combined ©2013 Cloudera, Inc. All Rights Reserved.
  • 3. Data Has Changed in the Last 30 YearsDATAGROWTH END-USER APPLICATIONS THE INTERNET MOBILE DEVICES SOPHISTICATED MACHINES STRUCTURED DATA – 10% 1980 2012 UNSTRUCTURED DATA – 90% 3 ©2013 Cloudera, Inc. All Rights Reserved.
  • 4. What if you wanted to… 4 Data Question Speed Usage Type/Form ©2013 Cloudera, Inc. All Rights Reserved.
  • 5. So what is Apache ? Self-Healing High-Bandwidth Clustered Storage Byte Streams Fault-Tolerant Distributed Processing Schema-on-Read 1 2 3 4 5 2 4 5 1 2 5 1 3 4 2 3 5 1 3 4 Input File HDFS storage distribution Node A Node B Node C Node D Node E 1 2 3 4 5 2 4 5 1 2 5 1 3 4 2 3 5 1 3 4 Output File MapReduce compute distribution Node A Node B Node C Node D Node E Storage Compute ©2013 Cloudera, Inc. All Rights Reserved.5
  • 6. 6 Next-Gen Data Management ©2013 Cloudera, Inc. All Rights Reserved.
  • 7. The Key Benefit: Agility/Flexibility 7 Schema-on-Read (Hadoop):Schema-on-Write (RDBMS): • Prescriptive Data Modeling: • Create static DB schema • Transform data into RDBMS • Query data in RDBMS format • New columns must be added explicitly before new data can propagate into the system. • Good for Known Unknowns (Repetition) • Descriptive Data Modeling: • Copy data in its native format • Create schema + parser • Query Data in its native format (does ETL on the fly) • New data can start flowing any time and will appear retroactively once the schema/parser properly describes it. • Good for Unknown Unknowns (Exploration) ©2013 Cloudera, Inc. All Rights Reserved.
  • 8. Scalable Technology + Scalable Development 8 Grows without requiring developers to re-architect their algorithms/application ©2013 Cloudera, Inc. All Rights Reserved. AUTO SCALE
  • 9. Low ROB (but still a ton of aggregate value) High ROB Economics: Return on Byte 9 ©2013 Cloudera, Inc. All Rights Reserved.
  • 10. Cloud Deployment CDH: Cloudera Distribution incl. Apache Hadoop Coordination Data Integration Fast Read/Write Access Batch Processing Languages Web Console Job Workflow Metadata APACHE ZOOKEEPER APACHE FLUME, APACHE SQOOP APACHE HBASE APACHE PIG, APACHE HIVE HUE APACHE OOZIE APACHE HIVE MetaStore Interactive SQL Data Mining Lib Impala APACHE MAHOUT APACHE WHIRR Build/Test:APACHEBIGTOP Cloudera Manager Free Edition (Installation Wizard) ©2013 Cloudera, Inc. All Rights Reserved.10 Hadoop Core Kernel MapReduce, HDFS Connectivity Data Processing Lib DataFu for Pig ODBC/JDBC/FUSE/HTTPS
  • 11. Cloudera Enterprise 11 ©2013 Cloudera, Inc. All Rights Reserved.
  • 12. The Cloudera Solution Stack 12 CLOUDERA UNIVERSITY DEVELOPER TRAINING ADMINISTRATOR TRAINING DATA SCIENCE TRAINING CERTIFICATION PROGRAMS PROFESSIONAL SERVICES USE CASE DISCOVERY NEW HADOOP DEPLOYMENT PROOF-OF-CONCEPT DEPLOYMENT CERTIFICATIONPROCESS & TEAM DEVELOPMENT PRODUCTION PILOTS MANAGEMENT SOFTWARE & TECHNICAL SUPPORT (SUBSCRIPTION) CDH INGEST STORE EXPLORE PROCESS ANALYZE SERVE CM CLOUDERA MANAGER CS CLOUDERA SUPPORT OSS APACHE HADOOP & OPEN SOURCE SOFTWARE ©2013 Cloudera, Inc. All Rights Reserved.
  • 13. Powered by Cloudera Impala 13 BEFORE IMPALA • With Impala: Interactive ANSI-92 SQL queries Native distributed query engine Optimized for low-latency • Provides: Answers as fast as you can ask Everyone can ask questions of all data Big data storage and analytics together WITH IMPALA • Unified storage: Supports HDFS and HBase Flexible file formats and schemas • Unified Metastore • Unified Security • Unified Client Interfaces: ODBC/JDBC SQL syntax Hue Beeswax Web UI BATCH PROCESSING USER INTERFACE REAL-TIME ACCESS ©2013 Cloudera, Inc. All Rights Reserved.
  • 14. Cloudera in the Enterprise Stack 14 ©2013 Cloudera, Inc. All Rights Reserved.
  • 15. Use Case: A Major Financial Institution ©2013 Cloudera, Inc. All Rights Reserved.15 The Challenge: • Current EDW at capacity; cannot support growing data depth and width • Performance issues in business critical apps; little room for innovation. New solution saves tens of millions by optimizing existing EDW for analytics & reducing data storage costs by 99% The Solution: • Cloudera Enterprise offloads data storage (S), processing (T) & some analytics (Q) from the EDW. • EDW resources can now be focused on repeatable operational analytics. • Month data scan in 4 secs vs. 4 hours Operational (44%) ELT Processing (42%) Analytics (11%) DATA WAREHOUSE Analytics Processing Storage CLOUDERA Operational (50%) Analytics (50%) DATA WAREHOUSE
  • 16. Beyond Data Warehousing 16 COMMUNICATIONS Location- based advertising HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis EDUCATION & RESEARCH Experiment sensor analysis FINANCIAL SERVICES Risk & portfolio analysis New products ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization UTILITIES Smart Meter analysis for network capacity CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service MEDIA / ENTERTAINMENT Viewers / advertising effectiveness TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment LIFE SCIENCES Clinical trials Genomics RETAIL Consumer sentiment Optimized marketing AUTOMOTIVE Auto sensors reporting location, problems HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis OIL & GAS Drilling exploration sensor analysis ©2013 Cloudera, Inc. All Rights Reserved.
  • 17. 17 The Road Ahead Bringing Compute to Data Bringing Applications to Data 2006-2012 2013-???
  • 18. Flexibility • Store any data • Run any analysis • Keep’s pace with the rate of change of incoming data Scalability • Proven growth to PBS/1,000s of nodes • No need to rewrite queries, automatically scales • Keep’s pace with the rate of growth of incoming data Economics • Cost per TB at a fraction of other options • Keep all of your data alive in an active archive • Powering the data beats algorithm movement The Cloudera Platform for Big Data 18 ©2013 Cloudera, Inc. All Rights Reserved.