SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Hadoop vs. RDBMS for
Advanced Analytics
Josh Wills
April 26th, 2012
About Me

• jwills@cloudera.com
• Formerly of Google (2008 – 2011)
   • Worked on the ad auction
   • Led the team that build the data infrastructure for Google+
• Before that: a bunch of startups
   • Sometimes as a software engineer, sometimes as a statistician
• Math degree from Duke and a half-finished PhD from The
  University of Texas at Austin
• Now: Director of Data Science at Cloudera




                       Copyright 2012 Cloudera Inc. All rights reserved
Getting Started with Hadoop: Apache Hive

 • Stick with the relational
   models that you are
   used to working with

 • Great for the common
   starter use cases
    • Logs processing
    • Online data archival
    • ETL/ELT


                  Copyright 2012 Cloudera Inc. All rights reserved
Hadoop for Advanced Analytics




When Should I Use Hadoop instead of an RDBMS?




              Copyright 2012 Cloudera Inc. All rights reserved
First Symptom: COUNT DISTINCT




     Copyright 2012 Cloudera Inc. All rights reserved
Second Symptom: Cursors




  Copyright 2012 Cloudera Inc. All rights reserved
Third Symptom: ALTER TABLE OF_DOOM




        Copyright 2012 Cloudera Inc. All rights reserved
The Unit of Analysis Problem

 • Data warehouses are
   optimized to analyze
   transactions
   • Awesome for finance
     and ERP
   • Not ideal for product
     and marketing
 • A function of what
   databases are good at


                Copyright 2012 Cloudera Inc. All rights reserved
What Are You Trying to Analyze?

           Simple Entities                                    Complex Entities
 •   Static attributes                               •    Evolving attributes
 •   Flat data structure                             •    Hierarchical data structure
 •   Transient                                       •    Persistent
 •   Examples                                        •    Examples
     • SKUs                                                 • Customers
     • Line items from an invoice                           • Suppliers
     • Log messages                                         • Website visitors




                     Copyright 2011 Cloudera Inc. All rights reserved
Rods and Cones vs. Facial Recognition




              Copyright 2012 Cloudera Inc. All rights reserved
Structure the Data to Fit the Problem

 • HDFS Lets Us Store Our
   Data However We Want
 • We can choose storage
   schemas that are:
   •   Flexible
   •   Evolvable
   •   Compact
   •   Fast
       serialization/deserializati
       on

                   Copyright 2012 Cloudera Inc. All rights reserved
Advaned Analytics: Use Cases




   Copyright 2012 Cloudera Inc. All rights reserved
Simple Counts on Complex Objects




             Copyright 2012 Cloudera Inc. All rights reserved
Self-Self-Self-Joins




                Copyright 2012 Cloudera Inc. All rights reserved
Matching Problems




             Copyright 2012 Cloudera Inc. All rights reserved
We’re Hiring.
jwills@cloudera.com

Weitere ähnliche Inhalte

Was ist angesagt?

Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Daniel Abadi
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"DataConf
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 

Was ist angesagt? (20)

Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 

Andere mochten auch

Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?Cloudera, Inc.
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionkvaderlipa
 
Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDataWorks Summit
 
Big data meets big analytics
Big data meets big analyticsBig data meets big analytics
Big data meets big analyticsDeepak Ramanathan
 
BDSA Solutions Comparison sheet
BDSA Solutions Comparison sheetBDSA Solutions Comparison sheet
BDSA Solutions Comparison sheetKanikkannan L
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsAhmed Magdy Ezzeldin, MSc.
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUBAhmed Salman
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and HadoopFlytxt
 
Abc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa cieplaAbc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa cieplaabc-kotly
 
Abc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor slonecznyAbc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor slonecznyabc-kotly
 
Javascript4
Javascript4Javascript4
Javascript4mozks
 
Protección de las mujeres contra la violencia de genero en la argentina
Protección de las mujeres contra la violencia de genero en la argentinaProtección de las mujeres contra la violencia de genero en la argentina
Protección de las mujeres contra la violencia de genero en la argentinaMaría Isabel Sanchez
 
Papi 2008 buku 2
Papi   2008 buku 2Papi   2008 buku 2
Papi 2008 buku 2citra Joni
 
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem cieplaAbc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem cieplaabc-kotly
 
Abc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnegoAbc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnegoabc-kotly
 

Andere mochten auch (20)

Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
Hadoop World 2011: Hadoop vs. RDBMS for Big Data Analytics...Why Choose?
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Big data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introductionBig data, Hadoop, NoSQL DB - introduction
Big data, Hadoop, NoSQL DB - introduction
 
Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROI
 
Big data meets big analytics
Big data meets big analyticsBig data meets big analytics
Big data meets big analytics
 
BDSA Solutions Comparison sheet
BDSA Solutions Comparison sheetBDSA Solutions Comparison sheet
BDSA Solutions Comparison sheet
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
 
Life history of frog
Life history of frogLife history of frog
Life history of frog
 
Abc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa cieplaAbc ile kosztuje ogrzewanie pompa ciepla
Abc ile kosztuje ogrzewanie pompa ciepla
 
Abc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor slonecznyAbc jak zbudowany jest kolektor sloneczny
Abc jak zbudowany jest kolektor sloneczny
 
Compu
CompuCompu
Compu
 
Javascript4
Javascript4Javascript4
Javascript4
 
Protección de las mujeres contra la violencia de genero en la argentina
Protección de las mujeres contra la violencia de genero en la argentinaProtección de las mujeres contra la violencia de genero en la argentina
Protección de las mujeres contra la violencia de genero en la argentina
 
La tecnologia esperanzadora
La tecnologia esperanzadoraLa tecnologia esperanzadora
La tecnologia esperanzadora
 
Papi 2008 buku 2
Papi   2008 buku 2Papi   2008 buku 2
Papi 2008 buku 2
 
Barriers to insulin therapy
Barriers to insulin therapyBarriers to insulin therapy
Barriers to insulin therapy
 
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem cieplaAbc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
Abc czy potrzebna mi wentylacja mechaniczna z odzyskiem ciepla
 
Abc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnegoAbc jakie grzejniki dla kotla kondensacyjnego
Abc jakie grzejniki dla kotla kondensacyjnego
 

Ähnlich wie Hadoop vs. RDBMS for Advanced Analytics

Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learningjoshwills
 
Machine Learning and Hadoop: Present and future
Machine Learning and Hadoop: Present and futureMachine Learning and Hadoop: Present and future
Machine Learning and Hadoop: Present and futureCloudera, Inc.
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureData Science London
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesInSemble
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingKnowledgent
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachDATAVERSITY
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012Mrunal Shridhar
 
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal ShridharMrunal Shridhar
 

Ähnlich wie Hadoop vs. RDBMS for Advanced Analytics (20)

Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Machine Learning and Hadoop: Present and future
Machine Learning and Hadoop: Present and futureMachine Learning and Hadoop: Present and future
Machine Learning and Hadoop: Present and future
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Enterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum ComputingEnterprise Architecture in the Era of Big Data and Quantum Computing
Enterprise Architecture in the Era of Big Data and Quantum Computing
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
Building great dashboards by Mrunal Shridhar - Tableau Customer Conference 2012
 
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
2012 Tableau Customer Conference - Building Great Dashboards by Mrunal Shridhar
 

Kürzlich hochgeladen

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Hadoop vs. RDBMS for Advanced Analytics

  • 1. Hadoop vs. RDBMS for Advanced Analytics Josh Wills April 26th, 2012
  • 2. About Me • jwills@cloudera.com • Formerly of Google (2008 – 2011) • Worked on the ad auction • Led the team that build the data infrastructure for Google+ • Before that: a bunch of startups • Sometimes as a software engineer, sometimes as a statistician • Math degree from Duke and a half-finished PhD from The University of Texas at Austin • Now: Director of Data Science at Cloudera Copyright 2012 Cloudera Inc. All rights reserved
  • 3. Getting Started with Hadoop: Apache Hive • Stick with the relational models that you are used to working with • Great for the common starter use cases • Logs processing • Online data archival • ETL/ELT Copyright 2012 Cloudera Inc. All rights reserved
  • 4. Hadoop for Advanced Analytics When Should I Use Hadoop instead of an RDBMS? Copyright 2012 Cloudera Inc. All rights reserved
  • 5. First Symptom: COUNT DISTINCT Copyright 2012 Cloudera Inc. All rights reserved
  • 6. Second Symptom: Cursors Copyright 2012 Cloudera Inc. All rights reserved
  • 7. Third Symptom: ALTER TABLE OF_DOOM Copyright 2012 Cloudera Inc. All rights reserved
  • 8. The Unit of Analysis Problem • Data warehouses are optimized to analyze transactions • Awesome for finance and ERP • Not ideal for product and marketing • A function of what databases are good at Copyright 2012 Cloudera Inc. All rights reserved
  • 9. What Are You Trying to Analyze? Simple Entities Complex Entities • Static attributes • Evolving attributes • Flat data structure • Hierarchical data structure • Transient • Persistent • Examples • Examples • SKUs • Customers • Line items from an invoice • Suppliers • Log messages • Website visitors Copyright 2011 Cloudera Inc. All rights reserved
  • 10. Rods and Cones vs. Facial Recognition Copyright 2012 Cloudera Inc. All rights reserved
  • 11. Structure the Data to Fit the Problem • HDFS Lets Us Store Our Data However We Want • We can choose storage schemas that are: • Flexible • Evolvable • Compact • Fast serialization/deserializati on Copyright 2012 Cloudera Inc. All rights reserved
  • 12. Advaned Analytics: Use Cases Copyright 2012 Cloudera Inc. All rights reserved
  • 13. Simple Counts on Complex Objects Copyright 2012 Cloudera Inc. All rights reserved
  • 14. Self-Self-Self-Joins Copyright 2012 Cloudera Inc. All rights reserved
  • 15. Matching Problems Copyright 2012 Cloudera Inc. All rights reserved

Hinweis der Redaktion

  1. How do you know you have a unit of analysis problem? You’re doing a bunch of COUNT DISTINCT queries. You’re doing LAG/LEAD-style queries, or using a cursor.