SlideShare ist ein Scribd-Unternehmen logo
1 von 25
It takes two to tango! Is SQL-on-Hadoop the next big step?
Big Data Crunching A Retrospective
Three Phases
What was it like before Hadoop?
ThePhylogeneticTreeofElephants
Partitioned or Sharded RDBMSs
Data Warehouses
Massively Parallel Databases
Tech before Hadoop
Massively Parallel Databases
Shared Nothing Architecture
Hadoop - Early days
Acceptance Life Cycle
Acceptance
Exploration
Resistance
Complementary over Competitive
Split by Structure
What’s the best way to answer questions that span these
two worlds?
Can we interface SQL atop Hadoop?
Can we combine the strengths of parallel databases with
those of Hadoop?
SQL-on-Hadoop : Technology
Distributed Query Processing
Cloudera’s Impala
MapR supported Apache Drill and more..
Split Query Processing
Microsoft Polybase
Hadapt
SQL-on-Hadoop : Technical Approaches
Faster Hive
Hortonworks’ Stinger initiative
Qubole’s Hive-on-the-Cloud
Distributed Query Processing
Cloudera Impala : Architecture
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Data Node Data Node
Impala Daemon Impala Daemon Impala Daemon
Data Node
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
Query Execution
Query Planning
Query Coordination
State StoreMetadata Catalog HDFS Name Node
Unified Metadata Store
Life Cycle of an Impala Query
Clients
Impala Shell JDBC/ODBC Client SQL Tools
Impala Daemon
Data Node
State StoreMetadata Catalog HDFS Name Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Impala Daemon
Data Node
Coordinate Execution
Plan and Optimize
Parse Query
Split Query Processing
Polybase + PDW : Architecture
Clients
ADO.NET JDBC/ODBC Client OLEDB
PDW Engine Service DMS Controller Loader Manager SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Job Tracker
Hadoop Cluster
Name Node
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
SQL Server
Compute Node
Data Move Service
HDFS Bridge
Compute Node
Data Move Service
SQL Server
SQL Server
Compute Node
Data Move Service
SQL Server PDW : Architecture
Control Node
CREATE HADOOP_CLUSTER GSL_CLUSTER WITH
(namenode=‘hadoop-head’,namenode_port=9000,
jobtracker=‘hadoop-head’,jobtracker_port=9010);
Register the Hadoop Cluster with PDW
Map HDFS File to External Tables in PDW
CREATE EXTERNAL TABLE hdfsCustomer
( c_custkey!! bigint not null,
c_name!! varchar(25) not null,
c_address!! varchar(40) not null,
c_nationkey! integer not null,
c_phone! ! char(15) not null,
c_acctbal!! decimal(15,2) not null,
c_mktsegment! char(10) not null,
c_comment!! varchar(117) not null)
WITH (LOCATION='/tpch1gb/customer.tbl',
FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER,
EXTERNAL_FILEFORMAT = TEXT_FORMAT));
Life Cycle of a Split Query
Clients
ADO.NET JDBC/ODBC Client OLEDB
Loader Manager
Control Node
DMS Controller
Engine Service SQL Server
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Hadoop Cluster
Data Node
Task Tracker
Data Node
Task Tracker
Data Node
Task Tracker
PDW Cluster
HDFS Bridge
Compute Node
Data Move Service
SQL Server
Plan
Job Tracker
Name Node
Data Node
Task Tracker
SQL-on-Hadoop : The Technology
Faster Hive
Distributed Query Processors
Split Query Processors
SQL-on-Hadoop or Map Reduce?
</presentation>
More on
www.systemswemake.com
Follow : @systems_we_make

Weitere ähnliche Inhalte

Was ist angesagt?

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop EcosystemLior Sidi
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionChirag Ahuja
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveMike Frampton
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionEdureka!
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQModern Data Stack France
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovVasil Remeniuk
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudEdureka!
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For HadoopCloudera, Inc.
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handyPraveen Sripati
 

Was ist angesagt? (20)

Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solutionHadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
 
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQSQL et in-memory sur Hadoop avec Pivotal et HAWQ
SQL et in-memory sur Hadoop avec Pivotal et HAWQ
 
Scalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex GryzlovScalding by Adform Research, Alex Gryzlov
Scalding by Adform Research, Alex Gryzlov
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hw09 Sqoop Database Import For Hadoop
Hw09   Sqoop Database Import For HadoopHw09   Sqoop Database Import For Hadoop
Hw09 Sqoop Database Import For Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 

Andere mochten auch

W - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesW - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesPatrick Pawlowski
 
Reinventando a Colmeia
Reinventando a ColmeiaReinventando a Colmeia
Reinventando a Colmeiajmm kazi
 
Information Retrieval with Open Source
Information Retrieval with Open SourceInformation Retrieval with Open Source
Information Retrieval with Open Sourcekorzonek
 
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made EasyAlpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easyucsdakpsi
 

Andere mochten auch (6)

W - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languagesW - Recession & Depression in 28 different languages
W - Recession & Depression in 28 different languages
 
Reinventando a Colmeia
Reinventando a ColmeiaReinventando a Colmeia
Reinventando a Colmeia
 
Information Retrieval with Open Source
Information Retrieval with Open SourceInformation Retrieval with Open Source
Information Retrieval with Open Source
 
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made EasyAlpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
Alpha Kappa Psi Professional Development Workshop. Interviewing Made Easy
 
Diigo Presentation
Diigo PresentationDiigo Presentation
Diigo Presentation
 
Q - The House Of Slaves
Q - The House Of SlavesQ - The House Of Slaves
Q - The House Of Slaves
 

Ähnlich wie It takes two to tango! : Is SQL-on-Hadoop the next big step?

Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingHadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingN Benchmark IT Solutions
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutesKaren Lopez
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data trainingagiamas
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingSamatha Kamuni
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourseSamatha Kamuni
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsItai Yaffe
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 

Ähnlich wie It takes two to tango! : Is SQL-on-Hadoop the next big step? (20)

Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online TrainingHadoop Training in Hyderabad | Online Training
Hadoop Training in Hyderabad | Online Training
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 
Apache drill
Apache drillApache drill
Apache drill
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 

Kürzlich hochgeladen

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 

It takes two to tango! : Is SQL-on-Hadoop the next big step?

  • 1.
  • 2. It takes two to tango! Is SQL-on-Hadoop the next big step?
  • 3. Big Data Crunching A Retrospective
  • 5. What was it like before Hadoop? ThePhylogeneticTreeofElephants
  • 6. Partitioned or Sharded RDBMSs Data Warehouses Massively Parallel Databases Tech before Hadoop
  • 7. Massively Parallel Databases Shared Nothing Architecture
  • 12. What’s the best way to answer questions that span these two worlds? Can we interface SQL atop Hadoop? Can we combine the strengths of parallel databases with those of Hadoop?
  • 14. Distributed Query Processing Cloudera’s Impala MapR supported Apache Drill and more.. Split Query Processing Microsoft Polybase Hadapt SQL-on-Hadoop : Technical Approaches Faster Hive Hortonworks’ Stinger initiative Qubole’s Hive-on-the-Cloud
  • 16. Cloudera Impala : Architecture Clients Impala Shell JDBC/ODBC Client SQL Tools Data Node Data Node Impala Daemon Impala Daemon Impala Daemon Data Node Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination Query Execution Query Planning Query Coordination State StoreMetadata Catalog HDFS Name Node Unified Metadata Store
  • 17. Life Cycle of an Impala Query Clients Impala Shell JDBC/ODBC Client SQL Tools Impala Daemon Data Node State StoreMetadata Catalog HDFS Name Node Impala Daemon Data Node Impala Daemon Data Node Impala Daemon Data Node Coordinate Execution Plan and Optimize Parse Query
  • 19. Polybase + PDW : Architecture Clients ADO.NET JDBC/ODBC Client OLEDB PDW Engine Service DMS Controller Loader Manager SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Job Tracker Hadoop Cluster Name Node Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster SQL Server Compute Node Data Move Service HDFS Bridge Compute Node Data Move Service SQL Server SQL Server Compute Node Data Move Service SQL Server PDW : Architecture Control Node
  • 20. CREATE HADOOP_CLUSTER GSL_CLUSTER WITH (namenode=‘hadoop-head’,namenode_port=9000, jobtracker=‘hadoop-head’,jobtracker_port=9010); Register the Hadoop Cluster with PDW
  • 21. Map HDFS File to External Tables in PDW CREATE EXTERNAL TABLE hdfsCustomer ( c_custkey!! bigint not null, c_name!! varchar(25) not null, c_address!! varchar(40) not null, c_nationkey! integer not null, c_phone! ! char(15) not null, c_acctbal!! decimal(15,2) not null, c_mktsegment! char(10) not null, c_comment!! varchar(117) not null) WITH (LOCATION='/tpch1gb/customer.tbl', FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER, EXTERNAL_FILEFORMAT = TEXT_FORMAT));
  • 22. Life Cycle of a Split Query Clients ADO.NET JDBC/ODBC Client OLEDB Loader Manager Control Node DMS Controller Engine Service SQL Server HDFS Bridge Compute Node Data Move Service SQL Server Hadoop Cluster Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker PDW Cluster HDFS Bridge Compute Node Data Move Service SQL Server Plan Job Tracker Name Node Data Node Task Tracker
  • 23. SQL-on-Hadoop : The Technology Faster Hive Distributed Query Processors Split Query Processors