Suche senden
Hochladen
Hive analytic workloads hadoop summit san jose 2014
•
Als PPTX, PDF herunterladen
•
7 gefällt mir
•
4,545 views
A
alanfgates
Folgen
Technologie
Melden
Teilen
Melden
Teilen
1 von 19
Jetzt herunterladen
Empfohlen
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Empfohlen
Strata Stinger Talk October 2013
Strata Stinger Talk October 2013
alanfgates
Hive acid-updates-summit-sjc-2014
Hive acid-updates-summit-sjc-2014
alanfgates
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Future of Data Meetup
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
hdhappy001
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
Weitere ähnliche Inhalte
Was ist angesagt?
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
HiveACIDPublic
HiveACIDPublic
Inderaj (Raj) Bains
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
DataWorks Summit
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
Using Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Future of Data Meetup
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive on ACID
Apache Hive on ACID
DataWorks Summit/Hadoop Summit
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
The Heterogeneous Data lake
The Heterogeneous Data lake
DataWorks Summit/Hadoop Summit
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
Douglas Bernardini
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
markgrover
Was ist angesagt?
(20)
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
HiveACIDPublic
HiveACIDPublic
Tune up Yarn and Hive
Tune up Yarn and Hive
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive
Apache Hive ACID Project
Apache Hive ACID Project
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Using Apache Hive with High Performance
Using Apache Hive with High Performance
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Hive on Spark, production experience @Uber
Hive on Spark, production experience @Uber
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Apache Hive on ACID
Apache Hive on ACID
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
The Heterogeneous Data lake
The Heterogeneous Data lake
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
NYC HUG - Application Architectures with Apache Hadoop
NYC HUG - Application Architectures with Apache Hadoop
Ähnlich wie Hive analytic workloads hadoop summit san jose 2014
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
hdhappy001
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
Data Con LA
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
Hadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
DataWorks Summit
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
Big data solutions in Azure
Big data solutions in Azure
Mostafa
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Chris Nauroth
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Hortonworks
Building Big data solutions in Azure
Building Big data solutions in Azure
Mostafa
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hortonworks
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
Ähnlich wie Hive analytic workloads hadoop summit san jose 2014
(20)
Gunther hagleitner:apache hive & stinger
Gunther hagleitner:apache hive & stinger
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
Hadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Big data solutions in Azure
Big data solutions in Azure
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
Stinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
Building Big data solutions in Azure
Building Big data solutions in Azure
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Mehr von alanfgates
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
Hortonworks apache training
Hortonworks apache training
alanfgates
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
alanfgates
Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
alanfgates
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
alanfgates
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
alanfgates
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
alanfgates
Strata feb2013
Strata feb2013
alanfgates
Mehr von alanfgates
(12)
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
Hortonworks apache training
Hortonworks apache training
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
Big data spain keynote nov 2016
Big data spain keynote nov 2016
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
Stinger hadoop summit june 2013
Stinger hadoop summit june 2013
Strata feb2013
Strata feb2013
Kürzlich hochgeladen
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
apidays
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
apidays
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Zilliz
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
The Digital Insurer
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Kürzlich hochgeladen
(20)
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Hive analytic workloads hadoop summit san jose 2014
1.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Hive for Analytic Workloads Alan Gates (@alanfgates)
2.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Stinger Project (announced February 2013) Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE Hive 0.13, April 2014: • Hive on Apache Tez • SQL standard authorization • Permanent UDFs • Vectorized Processing Hive 0.11, May 2013: • Base Optimizations • SQL Analytic Functions • ORCFile, Modern File Format Hive 0.12, October 2013: • VARCHAR, DATE Types • ORCFile predicate pushdown • Advanced Optimizations • Performance Boosts via YARN Speed Improve Hive query performance by 100X to allow for interactive query times (seconds) Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB SQL Support broadest range of SQL semantics for analytic applications running against Hadoop …all IN Hadoop Goals:
3.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Stinger Highlights • 13 months • 145 separate contributors – from 44 separate entities • 3 Hive releases, 0.11, 0.12, and 0.13 • 392,000 lines of new Java code
4.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. -Winston Churchill
5.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Hive 0.13 Performance • The TPC Benchmark™DS is a decision support benchmark that models queries and data maintenance. It evaluates decision support systems that examine large volumes of data to answer real-world business questions. • Test: 50 SQL queries on Hive 0.13 • Test Environment – Driven by the Hive Testbench: https://github.com/cartershanklin/hive-testbench – Nodes: 20 nodes, 256 GB per node – only 48G per node used for Hive – Drives: 6x 4TB WDC WD4000FYYZ-0 drives per node – Interconnect: 10GB – Processors: 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores per machine – Scale: 30K (30T total data)
6.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Benchmark Results Queries modified to have partition key that duplicates join key, making it easier for the optimizer to choose which partitions to scan.
7.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Benchmark Results Queries modified to have partition key that duplicates join key, making it easier for the optimizer to choose which partitions to scan.
8.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. SQL Semantics Release SQL Semantics Hive 0.10 & before SELECT, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, UNION, ROLLUP/CUBE, subqueries in FROM Hive 0.11 Windowing functions (RANK, ROW_NUMBER) and OVER clause Hive 0.13 • Subqueries with IN, EXISTS in WHERE and HAVING • Common table expressions (WITH clause) • Join condition in WHERE • CREATE FUNCTION (stored on cluster) Next Steps • Temporary tables • Subqueries with equality and inequality operators • Full UNION support • Set operators, EXCEPT and INTERSECT
9.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Security Release Security Hive 0.12 & before • StorageBasedAuthorizationProvider, maps file level security • secure, based on HDFS security • coarse grained, no column or row level security • default, all advisory • everyone has grant permissions Hive 0.13 SQL standard security for tables, views, and databases • GRANT/REVOKE • ROLEs • Column and row level permissions via views Next Steps • Integration with XA Secure • Extend to cover execution of functions
10.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Data Type Conformance Release Available Data Types Hive 0.10 & before Integer types, floating types, string, array, map, struct, timestamp, binary Hive 0.11 decimal (default precision and scale only) Hive 0.12 date, varchar Hive 0.13 char, user defined precision and scale for decimal
11.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Read and Write, ACID Release Write Capabilities, ACID Compliance Hive 0.12 & before • INSERT and INSERT OVERWRITE available • Locking available, requires ZooKeeper for durability • No ACID Hive 0.13 • ACID compliant ingestion of data from streaming sources such as Flume and Storm • Snapshot isolation for readers Next Steps • Addition of INSERT … VALUES, UPDATE, DELETE • Multi-statement transactions: BEGIN, COMMIT, ROLLBACK • Integration with HCatalog Owen and I have a talk on this at 5:30 today.
12.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Optimizer Release Optimizer Hive 0.11 & before Rules based optimizer • Mostly simple rules such as push filter below join Hive 0.12 Correlation optimizer • Where possible combine related execution into single job Next Steps • Use Optiq for cost based optimization • Join ordering and operator selection using statistics and cost estimates • Expand statistics calculated and used in planning Julian has a talk on this at 4:35 today.
13.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. MapReduce is dead, Long live Hadoop
14.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. MapReduce is dead, Long live Hadoop Tez Talks: • A New Chapter in Hadoop Data Processing, today 12:05 • Hive on Apache Tez: Benchmarked at Yahoo! Scale, today 12:05 • Hive + Tez: A Performance Deep Dive, today 2:35
15.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. ORC File Format •Columnar format for complex data types •Built into Hive from 0.11 •Support for Pig via OrcLoader/OrcStorer •Support for MapReduce via HCat •Two levels of compression –Lightweight type-specific and generic •Built in indexes –Every 10,000 rows with position information –Min, Max, Sum, Count of each column –Supports seek to row number Page 15
16.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. ORC File Format • Hive 0.12 –Predicate Push Down –Improved run length encoding –Adaptive string dictionaries –Padding stripes to HDFS block boundaries • Hive 0.13 –Stripe-based Input Splits –Input Split elimination –Vectorized Reader –Customized Pig Load and Store functions –ACID support • Next Steps –Faster writes –Integer dictionaries –Better block buffering Page 16
17.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Vectorized Query Execution •Designed for Modern Processor Architectures –Avoid branching in the inner loop. –Make the most use of L1 and L2 cache. •How It Works –Process records in batches of 1,000 rows –Generate code from templates to minimize branching. •What It Gives –30x improvement in rows processed per second. –Initial prototype: 100M rows/sec on laptop • In Hive 0.13, initial (map) tasks vectorized • Current work: vectorize shuffle and reduce tasks Page 17
18.
© Hortonworks Inc.
2013.© Hortonworks Inc. 2013. Try it Yourself • Apache Hive 0.13 –http://hive.apache.org/downloads.html • Download and play with HDP-2.1 –http://hortonworks.com/products/hortonworks-sandbox/ for use on your laptop –http://hortonworks.com/hdp/ for use on your cluster
19.
© Hortonworks Inc.
2013. Confidential and Proprietary.© Hortonworks Inc. 2013. Confidential and Proprietary. Thank You! @alanfgates @hortonworks
Hinweis der Redaktion
21 – 29 sec, scan one day of items table
93 – fact to fact left outer join over a years data, finished in around an hour 13 – full year 6 way star join
Jetzt herunterladen