Suche senden
Hochladen
Llap: Locality is Dead
•
Als PPTX, PDF herunterladen
•
3 gefällt mir
•
899 views
T
t3rmin4t0r
Folgen
Hive LLAP Locality processing challenges and advantages
Weniger lesen
Mehr lesen
Software
Diashow-Anzeige
Melden
Teilen
Diashow-Anzeige
Melden
Teilen
1 von 16
Jetzt herunterladen
Empfohlen
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Empfohlen
Data organization: hive meetup
Data organization: hive meetup
t3rmin4t0r
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
Yu Liu
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Apache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive Does ACID
Hive Does ACID
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
From Device to Data Center to Insights
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
A Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
LLAP: Locality is dead (in the cloud)
LLAP: Locality is dead (in the cloud)
Future of Data Meetup
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
alpinedatalabs
Weitere ähnliche Inhalte
Was ist angesagt?
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
DataWorks Summit
Hive Does ACID
Hive Does ACID
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
Hive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
From Device to Data Center to Insights
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
A Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
Was ist angesagt?
(20)
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
Tune up Yarn and Hive
Tune up Yarn and Hive
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Hive Does ACID
Hive Does ACID
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Hive: Loading Data
Hive: Loading Data
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
From Device to Data Center to Insights
From Device to Data Center to Insights
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
A Multi Colored YARN
A Multi Colored YARN
Optimizing Hive Queries
Optimizing Hive Queries
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
Ähnlich wie Llap: Locality is Dead
LLAP: Locality is dead (in the cloud)
LLAP: Locality is dead (in the cloud)
Future of Data Meetup
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
alpinedatalabs
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
In-Memory Computing Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
Severalnines
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
HostedbyConfluent
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
Ulf Wendel
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
Fwdays
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
MySQL Replication: Pros and Cons
MySQL Replication: Pros and Cons
Rachel Li
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
usenix
usenix
xlight
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
Mgangler Virtualization
Mgangler Virtualization
Secure-24
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
Darpan Dinker
Ähnlich wie Llap: Locality is Dead
(20)
LLAP: Locality is dead (in the cloud)
LLAP: Locality is dead (in the cloud)
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
MySQL Replication: Pros and Cons
MySQL Replication: Pros and Cons
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
usenix
usenix
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
Mgangler Virtualization
Mgangler Virtualization
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
Kürzlich hochgeladen
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
masabamasaba
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Bert Jan Schrijver
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
Jittipong Loespradit
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
panagenda
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
Papp Krisztián
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
masabamasaba
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Delhi Call girls
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
masabamasaba
The title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Steffen Staab
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
OnePlan Solutions
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
Shane Coughlan
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2
Kürzlich hochgeladen
(20)
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
The title is not connected to what is inside
The title is not connected to what is inside
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
Llap: Locality is Dead
1.
Page1 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved LLAP: Locality is dead (in the cloud) Gopal Vijayaraghavan
2.
Page2 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Data Locality – as usually discussed Disk CPU Memory Network Share-nothing Shared
3.
Page3 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Cloud – The Network eats itself Network Processing Memory Network Shared
4.
Page4 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Cutaway Demo – LLAP on Cloud TL;DW - repeat LLA+S3 benchmark on HDC 3 LLAP (m4.xlarge) nodes, Fact table has 864,001,869 rows -------------------------------------------------------------------------------- VERTICES: 06/06 [==========================>>] 100% ELAPSED TIME: 1.68 s -------------------------------------------------------------------------------- INFO : Status: DAG finished successfully in 1.63 seconds INFO : Hortonworks Data Cloud LLAP is >25x faster EMR
5.
Page5 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved • Wait a minute – is this a new problem? • How well do we handle data locality on-prem? • Fast BI tools, how long can they afford to wait for locality? • We do have non-local readers sometimes, I know it • I mean, that’s why we have HDFS right? Amdahl’s law knocks, who answers?
6.
Page6 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Data Locality – BI tools fight you, even on-prem Disk CPU Memory Network Share-nothing Shared
7.
Page7 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Data Locality – what it looks like (sometimes) HDFS CPU Memory Network Share-nothing Shared
8.
Page8 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Evaluation week of cool new BI tool – easy mistakes Rack#1 Mistake #1 – use whole cluster to load sample data (to do it real fast, time is money) Mistake #2 – use whole cluster to test BI tool (let’s really see how fast it can be) Mistake #3 – Use exactly 1 rack (we’re not going to make that one) Rack#2 Rack#3 ☑ ☑ ☒
9.
Page9 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Someone says “Data Lake” and sets up a river Rack#1 BI – becomes a 30% tenant Rack#2 Rack#3 Arguments start on how to set it up How about 1 node every rack? We’ll get lots of rack locality All joins can’t be co-located, so shuffle is always cross-rack – SLOW! And you noticed that the Kafka pipeline running on rack #2 is a big noisy neigbhour Fast is what we’re selling, so that won’t do
10.
Page10 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved “Noisy network neighbours? Get a dedicated rack” Rack#1 BI – gets its own rack. ETL – gets the other two Rack#2 Rack#3 All files have 3 replicas - but you might still not have rack locality. 3 replicas in HDFS – always uses 2 racks (3rd replica is in-rack to 2nd) replication=10 on 20-node racks, uses 2 racks (1+9 replicas)
11.
Page11 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Dedicated rack – your DFS IO is now crossing racks Rack#1 The real victims are now broadcast joins – which scan fact tables over the network. If your ETL is coming from off-rack – 50% probability that your new data has no locality in rack #1 You either have 2 replicas in Rack #1 or none. Rack#2 Rack#3 If you try to fix this with a custom placement policy, the DNs on rack #1 will get extra writes Tail your DFS audit logs folks – there’s so much info there
12.
Page12 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Rack#1 However, you realize that you can make a “copy” in rack #1 But for this cluster, until you setrep 7, there’s no way to be sure rack #1 has a copy. Rack#2 Rack#3 Cache!
13.
Page13 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Caching efficiently - LLAP’s tricks LLAP’s cache is decentralized, columnar, automatic, additive, packed and layered. When a new column or partition is used, the cache adds to itself incrementally - unlike immutable caches
14.
Page14 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved LLAP cache: ACID transactional snapshots LLAP cache is built to handle Hive ACID 2.x data with overlapping read transactions. With failure tolerance across retries Q1Q2 LLAP Partition=1 [txns=<1,2>] Partition=1 [txns=<1>] ✔ ✔ Partition=1 (retry) [txns=<1,2>] Partition=1 (retry) [txns=<1>] ✔ ✔ HIVE-12631 This works with a single cached copy for any rows which are common across the transactions. The retries work even if txn=2 deleted a row which existed in txn=1. Q2 is a txn ahead of Q1 Same partition, different data (in cache)
15.
Page15 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Locality is dead, long live cache affinity Rack#1 1 3 2 Split #1 [2,3,1] Split #2 [3,1,2] If node #2 fails or is too busy, scheduler will skip. When a node reboots, it takes up lowest open slot when it comes up A reboot might cause an empty slot, but won’t cause cache misses on others
16.
Page16 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Questions? This presentation represents the work of several folks from the Hive community over several years – Sergey, Gunther, Prasanth, Rajesh, Nita, Ashutosh, Jesus, Deepak, Jason, Sid, Matt, Teddy, Eugene and Vikram.
Jetzt herunterladen