SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: long-lived execution in Hive
Sergey Shelukhin
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: long-lived execution in Hive
Stinger recap and even faster queries+
+ LLAP: overview+
+ Query fragment execution+
+ IO elevator and caching+
+ Performance+
+ Current status and future directions+
+ Query fragment API+
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive performance recap
• Stinger: An Open Roadmap to improve Apache Hive’s
performance 100x
• Delivered in 100% Apache Open Source
• Stinger.Next: Enterprise SQL at Hadoop Scale
• Launched in September 2014, phase 1 delivered in 2015
Vectorized SQL Engine,
Tez Execution Engine,
ORC Columnar format
Cost Based Optimizer
Hive 0.10
Batch
Processing
100-150x Query Speedup
Hive 0.14
Human
Interactive
(5 seconds)
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The road ahead to sub-second queries
• Startup costs are now a key bottleneck
• Example: JVM takes 100s of ms to start up
• Vectorized code can benefit from JIT optimization
• JIT optimizer needs (run)time to do its work
• Improved operator performance shifts focus on IO
• Reading data is serialized with data processing
• Reading from HDFS is relatively expensive
• Large machines provide opportunities for data sharing
• Both between parallel computation (sharing) and serial (caching)
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: overview
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
What is LLAP?
• Hybrid execution with daemons in Hive
• Eliminates startup costs for tasks
• Allows the JIT optimizer to have time to optimize
• Multi-threaded execution of vectorized
operator pipelines
• Also allows sharing of metadata, map join tables, etc.
• Asynchronous IO elevator and caching
• Reduces IO cost and parallelizes IO and processing
• Can be spindle-aware; other IO optimizations
• Query fragment API
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
What LLAP isn't
• Not a Hive execution engine (like Tez, MR, Spark…)
• Execution engines provide coordination and scheduling
• Some work (e.g. large shuffles) can still be scheduled in containers
• Not a storage layer
• Daemons are stateless and read (and cache) data from HDFS
• Does not supersede existing Hive
• Container-based execution still fully supported
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Example execution: MR vs Tez vs Tez+LLAP
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
HDFS
In-Memory
columnar cache
Map – Reduce
Intermediate results in HDFS
Tez
Optimized Pipeline
Tez with LLAP
Resident process on Nodes
Map tasks
read HDFS
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP in your cluster
• LLAP daemons run on existing YARN
• Apache Slider is used for provisioning and recovery
• Easy to bring up, tear down, and share clusters
• Resource management via YARN delegation model (WIP)
• LLAP and containers dynamically balance resource usage (WIP)
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Benefits unrelated to performance (WIP)
• Concurrent query execution and priority enforcement
• Access control, including column-level security
• ACID improvements
• Can be used externally via the API
• Will be usable e.g. by Spark, Pig, Cascading, …
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query fragment API
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query Fragment API - overview
• Hadoop RPC, protobuf are used to send fragments
• Fragments are "physical algebra": operators, metadata, input
sources and output channels
• Results are returned asynchronously via output channels
• Hive will produce fragments for LLAP as part of physical
optimization
• Other applications can compile their own physical algebra
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query Fragment API – algebra
• Operators: Scan, Filter, Group By, Hash/Merge join, etc.
• Operators may include statistics for local optimization
• Expressions: comparison, arithmetic, Hive built-in functions
• All Hive datatypes
• Complex types like map/list/etc. – WIP
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query Fragment API – client API
• Encapsulates creation, submission of query fragments
• Also helps with IO from LLAP
• Getting vectorized record readers, batches, etc.
• Working with output channels (cancellation, availability of records,
failure)
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query execution
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: Query Execution
Overview of Query Execution+
+ Scheduling+
++
+ Coordination via Tez+
What Fragments run in LLAP vs Containers+
Future work+
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Tez + LLAP – overview
• Hive on Tez already proven to perform well
• Tez being enhanced to allow it to coordinate work to external
systems (TEZ-2003)
• Pluggable Scheduling
• Pluggable communication – custom execution specifications, protocols
• DAG coordination remains unchanged
• Hive Operators / Tez Runtime components used for Processing
and data transfer
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Deciding on where query components run
• Fragments can run in LLAP, regular containers, AM (as threads)
• Decision made by the Hive Client
• Configurable – all in LLAP, none in LLAP, intelligent mix
• Criteria for running in LLAP (in auto mode)
• No user code (or only blessed user code)
• Data source – HDFS
• ORC and vectorized execution (for now)
• Others can still run in LLAP in "all" mode, w/o IO elevator and cache
• Data size limitations (avoid heavy / long running processing within LLAP)
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
So…
M M M
R R
R
M M
R
R
Tez
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with LLAP (auto)
auto
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
AM AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with LLAP (auto)
T T T
R R
R
T T
T
R
Tez with LLAP (all)
allauto
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Scheduling for LLAP in Tez AM
• Greedy scheduling per query – assumes entire cluster available
• Schedule work to preferred location (HDFS locality)
• Multiple independent queries set the same preferred location if accessing the
same data (improves cache locality)
• LLAP Daemons schedule fragments independently – across
multiple queries
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP
Queue
Queuing fragments
• LLAP daemon has a number of executors
(think containers)
• Wait queue with pluggable priority
• Geared towards low latency queries (default)
• Models estimated work left in query
• Sequencing within a query handled via topological
order
• Fragment start time factors into scheduling decision
Executor
Q1 Reducer 2
Executor
Q1 Map 1
Executor
Q1 Map 1
Executor
Q3 Map 19
Q1 Reducer 2
Q1 Map 1
Q3 Map 19
Q1 Reducer 2
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP Scheduling – pipelining and preemption
• A fragment can run when inputs are not yet
available (for pipelining)
• A fragment is "finishable" if
all the source data is ready
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce
Well, 10
mapper out of
100 are done!
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP Scheduling – pipelining and preemption
• A fragment can run when inputs are not yet
available (for pipelining)
• A fragment is "finishable" if
all the source data is ready
• If the data is not ready, may never free the executor
• Non-finishable fragments can be preempted
• Improves throughput, prevents deadlocks
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP Scheduling – pipelining and preemption
• A fragment can run when inputs are not yet
available (for pipelining)
• A fragment is "finishable" if
all the source data is ready
• If the data is not ready, may never free the executor
• Non-finishable fragments can be preempted
• Improves throughput, prevents deadlocks
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
IO elevator and other internals
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: IO elevator and other internals
Asynchronous IO and decompression+
+ Off-heap data caching+
++
+ File metadata caching+
Map join table sharing+
Better JIT usage thanks to persistent daemon+
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Asynchronous IO
• Currently, Hive IO and input
decoding is interleaved
with processing
• Remote HDFS reads are
expensive
• Even local disk might be
• Data decompression and
decoding is expensive
Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Asynchronous IO
• With IO elevator, reading,
decoding and processing are
parallel
• IO threads can be spindle
aware (WIP)
• Depending on workload, IO
and processing threads can
balance resource usage
(throttle IO, etc.) (WIP)
Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Caching and off-heap data
• Decompressed data is cached off-heap
• Simplifies memory management, mitigates some GC problems
• Saves HDFS and decompression costs, esp. on dimension tables
• In future, processing cache data directly possible to avoid copies
• Replacement policy is pluggable
• Currently, simple local policies are used e.g. FIFO, LRFU
• Other policies possible (e.g. workflow-adaptable, or lazily
coordinated for better cache affinity)
Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cache size vs operator memory requirement
• Cache space takes away from operator space
• Sort buffers, hash join tables, GBY buffers take space
• Tradeoff between HDFS reads and operator speed
• Depends on workflow, dataset size, etc.
• New vectorization changes in Hive will speed up operators and
allow for larger cache
Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Other benefits
• File metadata and indexes are cached
• Much faster PPD application for selective queries – no HDFS reads
• Same replacement as data cache (but higher priority)
• Map join hash tables, fragment plans are shared
• Multiple tasks do not all generate the table or deserialize the plans
• Better use of JIT optimizer
• Because the daemons are persistent, JIT has more time to kick in
• Especially good with vectorization!
Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Performance
Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Setup
• 13 physical machines (12 cores, 40Gb RAM each)
• Note – smaller cluster than previous Tez perf runs
• TPCDS 200, interactive queries
• Both – ORC, vectorized, Hadoop 2.8, queries via HS2 w/JMeter
• TEZ: Hive 1.2 + Tez 0.8 (snapshot)
• Pre-warm and container reuse enabled
• LLAP: Branch in pre-alpha stage + Tez 0.8 (snapshot)
• Bias towards executors – small cache
• Otherwise no tuning
Page36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
• NOTE - in early stage – pre-alpha-release perf results
• Still, interactive queries are already 1.5-4 times faster
• First query result after launching CLI significantly improved
• In real life, LLAP daemons would also already be warm
• Parallel queries are already better
• Lots of work still ahead – epic locks in Kryo, Log4j, HDFS, HiveServer2;
better object sharing, better priority enforcement
• Should be much faster in short order
Page37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query execution time
0
5
10
15
20
25
30
35
query55 query42 query52 query3 query12 query27 query26 query7 query19 query96 query43 query15 query82 query13
Execuonme,sec
Hive (1.2.0)
Hive (LLAP)
Page38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Parallel query execution
• 8 users, 4 parallel
executors on HS
• Tez: 50% of serial
time; LLAP alpha:
41% of serial time
0
50
100
150
200
250
300
Serial Parallel
Execuonme,sec
Total execu on me (13 queries)
Hive (1.2.0)
Hive (LLAP)
Page39 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Current status and future directions
Page40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Current status
• Putting the finishing touches on the CTP (alpha release)
• Watch Hortonworks blog, and Apache Hive mailing lists, for details!
• The basic features are functional
• Currently only on Tez; IO only on vectorized and ORC
• AKA the fastest Hive setup possible 
• Lots of performance improvement not yet realized
• Lots of advanced features are WIP or planned
Page41 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Work in progress
• Further performance improvement
• Concurrent query execution improvements
• Better vectorized operators (join, group by, …)
• Defining the API
Page42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Future work
• Security, including column level security
• Tighter integration with YARN, e.g. resource delegation
• Guaranteed Capacities for better SLA guarantee, maybe with central scheduler
• Dynamic daemon sizing with off-heap storage
• ACID support
• Better (maybe centrally coordinated) locality and caching
• Temp tables, intermediate query results in LLAP
• Interleaving of Fragment Execution
• Past processing is not lost (as against preemption)
• A rogue / badly scheduled query will not hog the system
Page43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions?
?
Interested? Stop by the Hortonworks booth to learn more

Weitere ähnliche Inhalte

Was ist angesagt?

hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleDataWorks Summit
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Flink Forward
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsDataWorks Summit
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 

Was ist angesagt? (20)

hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
Towards Flink 2.0: Unified Batch & Stream Processing - Aljoscha Krettek, Verv...
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 

Ähnlich wie LLAP: long-lived execution in Hive

Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFTDataWorks Summit
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_featuresAlberto Romero
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureJianfeng Zhang
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureRajesh Balamohan
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleHortonworks
 

Ähnlich wie LLAP: long-lived execution in Hive (20)

LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

LLAP: long-lived execution in Hive

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive Sergey Shelukhin
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive Stinger recap and even faster queries+ + LLAP: overview+ + Query fragment execution+ + IO elevator and caching+ + Performance+ + Current status and future directions+ + Query fragment API+
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive performance recap • Stinger: An Open Roadmap to improve Apache Hive’s performance 100x • Delivered in 100% Apache Open Source • Stinger.Next: Enterprise SQL at Hadoop Scale • Launched in September 2014, phase 1 delivered in 2015 Vectorized SQL Engine, Tez Execution Engine, ORC Columnar format Cost Based Optimizer Hive 0.10 Batch Processing 100-150x Query Speedup Hive 0.14 Human Interactive (5 seconds)
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved The road ahead to sub-second queries • Startup costs are now a key bottleneck • Example: JVM takes 100s of ms to start up • Vectorized code can benefit from JIT optimization • JIT optimizer needs (run)time to do its work • Improved operator performance shifts focus on IO • Reading data is serialized with data processing • Reading from HDFS is relatively expensive • Large machines provide opportunities for data sharing • Both between parallel computation (sharing) and serial (caching)
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: overview
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved What is LLAP? • Hybrid execution with daemons in Hive • Eliminates startup costs for tasks • Allows the JIT optimizer to have time to optimize • Multi-threaded execution of vectorized operator pipelines • Also allows sharing of metadata, map join tables, etc. • Asynchronous IO elevator and caching • Reduces IO cost and parallelizes IO and processing • Can be spindle-aware; other IO optimizations • Query fragment API Node LLAP Process Cache Query Fragment HDFS Query Fragment
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved What LLAP isn't • Not a Hive execution engine (like Tez, MR, Spark…) • Execution engines provide coordination and scheduling • Some work (e.g. large shuffles) can still be scheduled in containers • Not a storage layer • Daemons are stateless and read (and cache) data from HDFS • Does not supersede existing Hive • Container-based execution still fully supported
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Example execution: MR vs Tez vs Tez+LLAP M M M R R M M R M M R M M R HDFS HDFS HDFS T T T R R R T T T R M M M R R R M M R R HDFS In-Memory columnar cache Map – Reduce Intermediate results in HDFS Tez Optimized Pipeline Tez with LLAP Resident process on Nodes Map tasks read HDFS
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP in your cluster • LLAP daemons run on existing YARN • Apache Slider is used for provisioning and recovery • Easy to bring up, tear down, and share clusters • Resource management via YARN delegation model (WIP) • LLAP and containers dynamically balance resource usage (WIP)
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Benefits unrelated to performance (WIP) • Concurrent query execution and priority enforcement • Access control, including column-level security • ACID improvements • Can be used externally via the API • Will be usable e.g. by Spark, Pig, Cascading, …
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query fragment API
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Fragment API - overview • Hadoop RPC, protobuf are used to send fragments • Fragments are "physical algebra": operators, metadata, input sources and output channels • Results are returned asynchronously via output channels • Hive will produce fragments for LLAP as part of physical optimization • Other applications can compile their own physical algebra
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Fragment API – algebra • Operators: Scan, Filter, Group By, Hash/Merge join, etc. • Operators may include statistics for local optimization • Expressions: comparison, arithmetic, Hive built-in functions • All Hive datatypes • Complex types like map/list/etc. – WIP
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Fragment API – client API • Encapsulates creation, submission of query fragments • Also helps with IO from LLAP • Getting vectorized record readers, batches, etc. • Working with output channels (cancellation, availability of records, failure)
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query execution
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: Query Execution Overview of Query Execution+ + Scheduling+ ++ + Coordination via Tez+ What Fragments run in LLAP vs Containers+ Future work+
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tez + LLAP – overview • Hive on Tez already proven to perform well • Tez being enhanced to allow it to coordinate work to external systems (TEZ-2003) • Pluggable Scheduling • Pluggable communication – custom execution specifications, protocols • DAG coordination remains unchanged • Hive Operators / Tez Runtime components used for Processing and data transfer
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Deciding on where query components run • Fragments can run in LLAP, regular containers, AM (as threads) • Decision made by the Hive Client • Configurable – all in LLAP, none in LLAP, intelligent mix • Criteria for running in LLAP (in auto mode) • No user code (or only blessed user code) • Data source – HDFS • ORC and vectorized execution (for now) • Others can still run in LLAP in "all" mode, w/o IO elevator and cache • Data size limitations (avoid heavy / long running processing within LLAP)
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved So… M M M R R R M M R R Tez
  • 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved AM So… T T T R R R T T T R M M M R R R M M R R Tez Tez with LLAP (auto) auto
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved AM AM So… T T T R R R T T T R M M M R R R M M R R Tez Tez with LLAP (auto) T T T R R R T T T R Tez with LLAP (all) allauto
  • 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Scheduling for LLAP in Tez AM • Greedy scheduling per query – assumes entire cluster available • Schedule work to preferred location (HDFS locality) • Multiple independent queries set the same preferred location if accessing the same data (improves cache locality) • LLAP Daemons schedule fragments independently – across multiple queries
  • 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Queue Queuing fragments • LLAP daemon has a number of executors (think containers) • Wait queue with pluggable priority • Geared towards low latency queries (default) • Models estimated work left in query • Sequencing within a query handled via topological order • Fragment start time factors into scheduling decision Executor Q1 Reducer 2 Executor Q1 Map 1 Executor Q1 Map 1 Executor Q3 Map 19 Q1 Reducer 2 Q1 Map 1 Q3 Map 19 Q1 Reducer 2
  • 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce Well, 10 mapper out of 100 are done!
  • 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready • If the data is not ready, may never free the executor • Non-finishable fragments can be preempted • Improves throughput, prevents deadlocks LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce
  • 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready • If the data is not ready, may never free the executor • Non-finishable fragments can be preempted • Improves throughput, prevents deadlocks LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3
  • 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved IO elevator and other internals
  • 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: IO elevator and other internals Asynchronous IO and decompression+ + Off-heap data caching+ ++ + File metadata caching+ Map join table sharing+ Better JIT usage thanks to persistent daemon+
  • 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Asynchronous IO • Currently, Hive IO and input decoding is interleaved with processing • Remote HDFS reads are expensive • Even local disk might be • Data decompression and decoding is expensive
  • 30. Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Asynchronous IO • With IO elevator, reading, decoding and processing are parallel • IO threads can be spindle aware (WIP) • Depending on workload, IO and processing threads can balance resource usage (throttle IO, etc.) (WIP)
  • 31. Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Caching and off-heap data • Decompressed data is cached off-heap • Simplifies memory management, mitigates some GC problems • Saves HDFS and decompression costs, esp. on dimension tables • In future, processing cache data directly possible to avoid copies • Replacement policy is pluggable • Currently, simple local policies are used e.g. FIFO, LRFU • Other policies possible (e.g. workflow-adaptable, or lazily coordinated for better cache affinity)
  • 32. Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cache size vs operator memory requirement • Cache space takes away from operator space • Sort buffers, hash join tables, GBY buffers take space • Tradeoff between HDFS reads and operator speed • Depends on workflow, dataset size, etc. • New vectorization changes in Hive will speed up operators and allow for larger cache
  • 33. Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Other benefits • File metadata and indexes are cached • Much faster PPD application for selective queries – no HDFS reads • Same replacement as data cache (but higher priority) • Map join hash tables, fragment plans are shared • Multiple tasks do not all generate the table or deserialize the plans • Better use of JIT optimizer • Because the daemons are persistent, JIT has more time to kick in • Especially good with vectorization!
  • 34. Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Performance
  • 35. Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Setup • 13 physical machines (12 cores, 40Gb RAM each) • Note – smaller cluster than previous Tez perf runs • TPCDS 200, interactive queries • Both – ORC, vectorized, Hadoop 2.8, queries via HS2 w/JMeter • TEZ: Hive 1.2 + Tez 0.8 (snapshot) • Pre-warm and container reuse enabled • LLAP: Branch in pre-alpha stage + Tez 0.8 (snapshot) • Bias towards executors – small cache • Otherwise no tuning
  • 36. Page36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary • NOTE - in early stage – pre-alpha-release perf results • Still, interactive queries are already 1.5-4 times faster • First query result after launching CLI significantly improved • In real life, LLAP daemons would also already be warm • Parallel queries are already better • Lots of work still ahead – epic locks in Kryo, Log4j, HDFS, HiveServer2; better object sharing, better priority enforcement • Should be much faster in short order
  • 37. Page37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query execution time 0 5 10 15 20 25 30 35 query55 query42 query52 query3 query12 query27 query26 query7 query19 query96 query43 query15 query82 query13 Execuonme,sec Hive (1.2.0) Hive (LLAP)
  • 38. Page38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Parallel query execution • 8 users, 4 parallel executors on HS • Tez: 50% of serial time; LLAP alpha: 41% of serial time 0 50 100 150 200 250 300 Serial Parallel Execuonme,sec Total execu on me (13 queries) Hive (1.2.0) Hive (LLAP)
  • 39. Page39 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Current status and future directions
  • 40. Page40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Current status • Putting the finishing touches on the CTP (alpha release) • Watch Hortonworks blog, and Apache Hive mailing lists, for details! • The basic features are functional • Currently only on Tez; IO only on vectorized and ORC • AKA the fastest Hive setup possible  • Lots of performance improvement not yet realized • Lots of advanced features are WIP or planned
  • 41. Page41 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Work in progress • Further performance improvement • Concurrent query execution improvements • Better vectorized operators (join, group by, …) • Defining the API
  • 42. Page42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Future work • Security, including column level security • Tighter integration with YARN, e.g. resource delegation • Guaranteed Capacities for better SLA guarantee, maybe with central scheduler • Dynamic daemon sizing with off-heap storage • ACID support • Better (maybe centrally coordinated) locality and caching • Temp tables, intermediate query results in LLAP • Interleaving of Fragment Execution • Past processing is not lost (as against preemption) • A rogue / badly scheduled query will not hog the system
  • 43. Page43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more