SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Page1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP: Locality is dead (in the cloud)
Gopal Vijayaraghavan
Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Locality – as usually discussed
Disk
CPU
Memory
Network
Share-nothing
Shared
Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cloud – The Network eats itself
Network
Processing
Memory
Network
Shared
Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cutaway Demo – LLAP on Cloud
TL;DW - repeat LLA+S3 benchmark on HDC
3 LLAP (m4.xlarge) nodes, Fact table has 864,001,869 rows
--------------------------------------------------------------------------------
VERTICES: 06/06 [==========================>>] 100% ELAPSED TIME: 1.68 s
--------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 1.63 seconds
INFO :
Hortonworks Data Cloud LLAP is >25x faster
EMR
Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• Wait a minute – is this a new problem?
• How well do we handle data locality on-prem?
• Fast BI tools, how long can they afford to wait for locality?
• We do have non-local readers sometimes, I know it
• I mean, that’s why we have HDFS right?
Amdahl’s law knocks, who answers?
Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Locality – BI tools fight you, even on-prem
Disk
CPU
Memory
Network
Share-nothing
Shared
Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Locality – what it looks like (sometimes)
HDFS
CPU
Memory
Network
Share-nothing
Shared
Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Evaluation week of cool new BI tool – easy mistakes
Rack#1
Mistake #1 – use whole cluster to load sample data (to do it real fast, time is money)
Mistake #2 – use whole cluster to test BI tool (let’s really see how fast it can be)
Mistake #3 – Use exactly 1 rack (we’re not going to make that one)
Rack#2
Rack#3
☑
☑
☒
Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Someone says “Data Lake” and sets up a river
Rack#1
BI – becomes a 30% tenant
Rack#2
Rack#3
Arguments start on how to set it up
How about 1 node every rack? We’ll get lots of rack locality
All joins can’t be co-located, so shuffle is always cross-rack – SLOW!
And you noticed that the Kafka pipeline running on rack #2 is a big noisy neigbhour
Fast is what we’re selling, so that won’t do
Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
“Noisy network neighbours? Get a dedicated rack”
Rack#1
BI – gets its own rack.
ETL – gets the other two
Rack#2
Rack#3
All files have 3 replicas - but you might still not have rack locality.
3 replicas in HDFS – always uses 2 racks
(3rd replica is in-rack to 2nd)
replication=10 on 20-node racks, uses 2 racks (1+9 replicas)
Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dedicated rack – your DFS IO is now crossing racks
Rack#1
The real victims are now broadcast joins – which scan fact tables over the network.
If your ETL is coming from off-rack – 50% probability that your new data has no locality in rack #1
You either have 2 replicas in Rack #1 or none.
Rack#2
Rack#3
If you try to fix this with a custom placement policy, the DNs on rack #1 will get extra writes
Tail your DFS audit logs folks – there’s so much info there
Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Rack#1
However, you realize that you can make a “copy” in rack #1
But for this cluster, until you setrep 7, there’s no way to be sure rack #1 has a copy.
Rack#2
Rack#3
Cache!
Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Caching efficiently - LLAP’s tricks
LLAP’s cache is decentralized, columnar, automatic, additive, packed and layered.
When a new column or partition
is used, the cache adds to itself
incrementally - unlike
immutable caches
Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP cache: ACID transactional snapshots
LLAP cache is built to handle Hive ACID 2.x data with overlapping read transactions.
With failure tolerance across retries
Q1Q2 LLAP
Partition=1 [txns=<1,2>]
Partition=1 [txns=<1>]
✔
✔
Partition=1 (retry) [txns=<1,2>]
Partition=1 (retry) [txns=<1>]
✔
✔
HIVE-12631
This works with a single cached
copy for any rows which are
common across the
transactions.
The retries work even if txn=2
deleted a row which existed in
txn=1.
Q2 is a txn ahead of Q1
Same partition, different data (in cache)
Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Locality is dead, long live cache affinity
Rack#1
1
3 2
Split #1 [2,3,1]
Split #2 [3,1,2]
If node #2 fails or is too busy, scheduler will skip.
When a node reboots, it takes up lowest open slot when it comes up
A reboot might cause an empty slot, but won’t cause cache misses on others
Page16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
This presentation represents the work of several folks from the Hive community over
several years – Sergey, Gunther, Prasanth, Rajesh, Nita, Ashutosh, Jesus, Deepak,
Jason, Sid, Matt, Teddy, Eugene and Vikram.

Weitere ähnliche Inhalte

Was ist angesagt?

Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem DataWorks Summit/Hadoop Summit
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerDataWorks Summit
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...DataWorks Summit
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015alanfgates
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Mich Talebzadeh (Ph.D.)
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelt3rmin4t0r
 

Was ist angesagt? (20)

Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 

Ähnlich wie Llap: Locality is Dead

LLAP: Locality is dead (in the cloud)
LLAP: Locality is dead (in the cloud)  LLAP: Locality is dead (in the cloud)
LLAP: Locality is dead (in the cloud) Future of Data Meetup
 
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute OperatorIntegrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operatoralpinedatalabs
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...In-Memory Computing Summit
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...HostedbyConfluent
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveUlf Wendel
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
 
MySQL Replication: Pros and Cons
MySQL Replication: Pros and ConsMySQL Replication: Pros and Cons
MySQL Replication: Pros and ConsRachel Li
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
usenix
usenixusenix
usenixxlight
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyDataWorks Summit
 
Mgangler Virtualization
Mgangler VirtualizationMgangler Virtualization
Mgangler VirtualizationSecure-24
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)Darpan Dinker
 

Ähnlich wie Llap: Locality is Dead (20)

LLAP: Locality is dead (in the cloud)
LLAP: Locality is dead (in the cloud)  LLAP: Locality is dead (in the cloud)
LLAP: Locality is dead (in the cloud)
 
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute OperatorIntegrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
Integrating R and the JVM Platform - Alpine Data Labs' R Execute Operator
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
MySQL Replication: Pros and Cons
MySQL Replication: Pros and ConsMySQL Replication: Pros and Cons
MySQL Replication: Pros and Cons
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
usenix
usenixusenix
usenix
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
 
Mgangler Virtualization
Mgangler VirtualizationMgangler Virtualization
Mgangler Virtualization
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
20 Real-World Use Cases to help pick a better MySQL Replication scheme (2012)
 

Kürzlich hochgeladen

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 

Kürzlich hochgeladen (20)

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

Llap: Locality is Dead

  • 1. Page1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP: Locality is dead (in the cloud) Gopal Vijayaraghavan
  • 2. Page2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Locality – as usually discussed Disk CPU Memory Network Share-nothing Shared
  • 3. Page3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cloud – The Network eats itself Network Processing Memory Network Shared
  • 4. Page4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cutaway Demo – LLAP on Cloud TL;DW - repeat LLA+S3 benchmark on HDC 3 LLAP (m4.xlarge) nodes, Fact table has 864,001,869 rows -------------------------------------------------------------------------------- VERTICES: 06/06 [==========================>>] 100% ELAPSED TIME: 1.68 s -------------------------------------------------------------------------------- INFO : Status: DAG finished successfully in 1.63 seconds INFO : Hortonworks Data Cloud LLAP is >25x faster EMR
  • 5. Page5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved • Wait a minute – is this a new problem? • How well do we handle data locality on-prem? • Fast BI tools, how long can they afford to wait for locality? • We do have non-local readers sometimes, I know it • I mean, that’s why we have HDFS right? Amdahl’s law knocks, who answers?
  • 6. Page6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Locality – BI tools fight you, even on-prem Disk CPU Memory Network Share-nothing Shared
  • 7. Page7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Locality – what it looks like (sometimes) HDFS CPU Memory Network Share-nothing Shared
  • 8. Page8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Evaluation week of cool new BI tool – easy mistakes Rack#1 Mistake #1 – use whole cluster to load sample data (to do it real fast, time is money) Mistake #2 – use whole cluster to test BI tool (let’s really see how fast it can be) Mistake #3 – Use exactly 1 rack (we’re not going to make that one) Rack#2 Rack#3 ☑ ☑ ☒
  • 9. Page9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Someone says “Data Lake” and sets up a river Rack#1 BI – becomes a 30% tenant Rack#2 Rack#3 Arguments start on how to set it up How about 1 node every rack? We’ll get lots of rack locality All joins can’t be co-located, so shuffle is always cross-rack – SLOW! And you noticed that the Kafka pipeline running on rack #2 is a big noisy neigbhour Fast is what we’re selling, so that won’t do
  • 10. Page10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “Noisy network neighbours? Get a dedicated rack” Rack#1 BI – gets its own rack. ETL – gets the other two Rack#2 Rack#3 All files have 3 replicas - but you might still not have rack locality. 3 replicas in HDFS – always uses 2 racks (3rd replica is in-rack to 2nd) replication=10 on 20-node racks, uses 2 racks (1+9 replicas)
  • 11. Page11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dedicated rack – your DFS IO is now crossing racks Rack#1 The real victims are now broadcast joins – which scan fact tables over the network. If your ETL is coming from off-rack – 50% probability that your new data has no locality in rack #1 You either have 2 replicas in Rack #1 or none. Rack#2 Rack#3 If you try to fix this with a custom placement policy, the DNs on rack #1 will get extra writes Tail your DFS audit logs folks – there’s so much info there
  • 12. Page12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Rack#1 However, you realize that you can make a “copy” in rack #1 But for this cluster, until you setrep 7, there’s no way to be sure rack #1 has a copy. Rack#2 Rack#3 Cache!
  • 13. Page13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Caching efficiently - LLAP’s tricks LLAP’s cache is decentralized, columnar, automatic, additive, packed and layered. When a new column or partition is used, the cache adds to itself incrementally - unlike immutable caches
  • 14. Page14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP cache: ACID transactional snapshots LLAP cache is built to handle Hive ACID 2.x data with overlapping read transactions. With failure tolerance across retries Q1Q2 LLAP Partition=1 [txns=<1,2>] Partition=1 [txns=<1>] ✔ ✔ Partition=1 (retry) [txns=<1,2>] Partition=1 (retry) [txns=<1>] ✔ ✔ HIVE-12631 This works with a single cached copy for any rows which are common across the transactions. The retries work even if txn=2 deleted a row which existed in txn=1. Q2 is a txn ahead of Q1 Same partition, different data (in cache)
  • 15. Page15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Locality is dead, long live cache affinity Rack#1 1 3 2 Split #1 [2,3,1] Split #2 [3,1,2] If node #2 fails or is too busy, scheduler will skip. When a node reboots, it takes up lowest open slot when it comes up A reboot might cause an empty slot, but won’t cause cache misses on others
  • 16. Page16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? This presentation represents the work of several folks from the Hive community over several years – Sergey, Gunther, Prasanth, Rajesh, Nita, Ashutosh, Jesus, Deepak, Jason, Sid, Matt, Teddy, Eugene and Vikram.