Suche senden
Hochladen
Low Latency SQL on Hadoop - What's best for your cluster
•
8 gefällt mir
•
2,959 views
DataWorks Summit
Folgen
Technologie
Melden
Teilen
Melden
Teilen
1 von 38
Empfohlen
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
Impala presentation
Impala presentation
trihug
Applications on Hadoop
Applications on Hadoop
markgrover
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
Data Science London
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Empfohlen
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera, Inc.
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera, Inc.
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
Cloudera, Inc.
Impala presentation
Impala presentation
trihug
Applications on Hadoop
Applications on Hadoop
markgrover
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
Data Science London
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Introduction to Impala
Introduction to Impala
markgrover
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
Cloudera impala
Cloudera impala
Swiss Big Data User Group
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera, Inc.
Cloudera Impala
Cloudera Impala
Scott Leberknight
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
Introduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
hadooparchbook
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
Weitere ähnliche Inhalte
Was ist angesagt?
Introduction to Impala
Introduction to Impala
markgrover
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
Cloudera impala
Cloudera impala
Swiss Big Data User Group
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
The Impala Cookbook
The Impala Cookbook
Cloudera, Inc.
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
Cloudera, Inc.
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera, Inc.
Cloudera Impala
Cloudera Impala
Scott Leberknight
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
Introduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
hadooparchbook
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
Jonathan Seidman
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
DataWorks Summit
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
Was ist angesagt?
(20)
Introduction to Impala
Introduction to Impala
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Cloudera impala
Cloudera impala
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
The Impala Cookbook
The Impala Cookbook
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
Cloudera Impala
Cloudera Impala
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
Introduction to Apache Kudu
Introduction to Apache Kudu
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Ähnlich wie Low Latency SQL on Hadoop - What's best for your cluster
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
Real Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely chen
Building data pipelines with kite
Building data pipelines with kite
Joey Echeverria
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Yifeng Jiang
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
DataWorks Summit
Get most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
Hive on spark berlin buzzwords
Hive on spark berlin buzzwords
Szehon Ho
PySpark Best Practices
PySpark Best Practices
Cloudera, Inc.
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
MapR Technologies
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Intellipaat
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
MapR Technologies
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
DataStax
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
MLconf
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
David Lauzon
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Jeremy Beard
Ähnlich wie Low Latency SQL on Hadoop - What's best for your cluster
(20)
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Real Time Analytics with Dse
Real Time Analytics with Dse
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Building data pipelines with kite
Building data pipelines with kite
Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
Get most out of Spark on YARN
Get most out of Spark on YARN
Hive on spark berlin buzzwords
Hive on spark berlin buzzwords
PySpark Best Practices
PySpark Best Practices
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
Building Efficient Pipelines in Apache Spark
Building Efficient Pipelines in Apache Spark
Mehr von DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Mehr von DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Kürzlich hochgeladen
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Zilliz
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Stephanie Beckett
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
charlottematthew16
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
charlottematthew16
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Kürzlich hochgeladen
(20)
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Low Latency SQL on Hadoop - What's best for your cluster
1.
Low Latency SQL
on Hadoop What’s best for your cluster? Prepared by Alan Gardner June 2014
2.
Alan Gardner © 2013
Pythian2 @alanctgardner gardner@pythian.com
3.
© 2013 Pythian3
4.
© 2013 Pythian4
5.
Overview • Performance • Architecture •
Features • Vendor Support • Conclusions © 2013 Pythian5
6.
Performance
7.
Berkeley Big Data
Benchmark • Hive, Hive-on-Tez, RedShift, Shark, Impala • Tested on five m2.4xlarge EC2 instances • Uses Intel’s Hadoop Benchmark, not TPC • ~150GB of © 2013 Pythian7
8.
Berkeley Big Data
Benchmark • Finds Shark fastest at straight scans, and tied with Impala for aggregation and joining • Hive-on-Tez is a distant third • Not using the optimized, columnar formats © 2013 Pythian8
9.
Cloudera SQL Benchmark •
Impala, Hive-on-Tez, Shark and Presto • Uses high-end hardware with relatively large memory, fastest data types for each engine • 15TB scale factor for a TPC-DS based test © 2013 Pythian9
10.
Cloudera SQL Benchmark •
Finds Impala to be significantly faster across all data sizes • Shark and Tez outperform Presto 0.60, with Tez performing better for larger result sets • It’s unclear if table© 2013 Pythian10
11.
Our Configuration • 9-node
cluster of m2.2xlarge instances • 4 cores, 34GB RAM • 850GB of instance storage • 100GB scale factor – only from disk, no RDDs • Impala 1.3.1 on CDH 5.0.1 • Hive 0.13 from the© 2013 Pythian11
12.
File Formats • Hive,
Shark - ORC (ZLIB) • Presto - ORC (ZLIB) – RCFile (LazyBinarySerDe) was slower – RCFile (ColumnarSerDe) may be better • Impala – Parquet (no compression) © 2013 Pythian12
13.
© 2013 Pythian13
14.
TPC-H Queries • Query
1 – filtering and aggregation on a single table • Query 8 – select two columns from joins across many-to-many relationships • Query 10 – select and aggregate on eight© 2013 Pythian14
15.
© 2013 Pythian15
16.
Architecture
17.
© 2013 Pythian17 •
Hive 0.13 runs on Tez, which executes queries as DAGs • DAGs are more efficient than MRv1 query plans • Runs on YARN, resources are shared between all jobs • Individual node failures are tolerated and retried automatically
18.
© 2013 Pythian18 •
HiveServer creates a DAG from HQL submitted over JDBC • HiveServer requests or reuses a Tez AM to run the query • Tez handles placement of query fragments based on locality and resources
19.
© 2013 Pythian19 •
Shark uses the same core as Hive: the HQL parser and the file and UDF interfaces are compatible • DAGs produced by Shark are optimized for Spark, rather than Tez • Spark can be run on YARN for resource sharing, as well as Mesos or stand- alone
20.
© 2013 Pythian20 •
Spark is more mature and offers a wider range of optimizations right now • Shark also supports storing results as an RDD within Spark
21.
© 2013 Pythian21 •
Impala runs as an engine ‘next to’ YARN, not on top of it • To reduce resource contention and allow scheduling to be centralized in YARN, Llama was created • Llama creates “fake” applications on YARN as placeholders for Impala
22.
© 2013 Pythian22 •
Impalad receives queries, plans and executes them • Statestore broadcasts metadata updates and node status • Catalog caches block metadata and Hive table metadata
23.
© 2013 Pythian23 •
Presto doesn’t interact with YARN at all • cgroups are the only way to share resources between YARN jobs and Presto • Presto also handles all scheduling and job placement by itself
24.
© 2013 Pythian24 •
Presto has a single coordinator which plans and distributes query fragments • Workers are still co-located with DataNodes for locality • Discovery service manages worker status
25.
Functionality
26.
© 2013 Pythian26
27.
© 2013 Pythian27
28.
Text RCFile Parquet
ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian28 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Reordering Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
29.
Text RCFile Parquet
ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian29 File Formats Flexibility SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality
30.
Text RCFile Parquet
ORCFile Avro SequenceFile Presto R R R R R R Impala R/W R R/W - R R Hive/Shark R/W R/W R/W R/W R/W R/W © 2013 Pythian30 SerDes Complex Data UDFs Spill to Disk JOIN Optimization Presto Yes Yes, but slow No No None Impala No No Yes No Cost-based Hive/Shark Yes Yes Yes Yes Cardinality File Formats Flexibility
31.
Vendor Support
32.
© 2013 Pythian32 Cloudera
MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
33.
© 2013 Pythian33 Cloudera
MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
34.
© 2013 Pythian34 Cloudera
MapR HortonWorks Presto No No No Impala Yes Yes No Hive No Tez No Tez Yes Shark Spark Yes Spark Note: based on vendor documentation as of 31/05/2014 Official Support
35.
Conclusions
36.
© 2013 Pythian36 A
giant, indecipherable flowchart
37.
Conclusions • Shark provides
a faster alternative to Hive 0.13 for ETL and analytics, but support is lacking and tuning is difficult • Presto is still nascent – deployment is easy, but querying is not so simple © 2013 Pythian37
38.
Thank you –
Q&A To contact us gardner@pythian.com 1-877-PYTHIAN @pythian @alanctgardner © 2013 Pythian38