Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: long-lived execution in Hive
Sergey Shelukhin
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: long-lived execution in Hive
Stinger recap and even faster...
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive performance recap
• Stinger: An Open Roadmap to improve Apa...
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The road ahead to sub-second queries
• Startup costs are now a k...
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: overview
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
What is LLAP?
• Hybrid execution with daemons in Hive
• Eliminat...
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
What LLAP isn't
• Not a Hive execution engine (like Tez, MR, Spa...
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Example execution: MR vs Tez vs Tez+LLAP
M M M
R R
M M
R
M M
R
M...
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP in your cluster
• LLAP daemons run on existing YARN
• Apach...
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Benefits unrelated to performance (WIP)
• Concurrent query exec...
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query fragment API
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query Fragment API - overview
• Hadoop RPC, protobuf are used t...
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query Fragment API – algebra
• Operators: Scan, Filter, Group B...
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query Fragment API – client API
• Encapsulates creation, submis...
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query execution
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: Query Execution
Overview of Query Execution+
+ Scheduling...
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Tez + LLAP – overview
• Hive on Tez already proven to perform w...
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Deciding on where query components run
• Fragments can run in L...
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
So…
M M M
R R
R
M M
R
R
Tez
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with LLA...
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
AM AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with ...
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Scheduling for LLAP in Tez AM
• Greedy scheduling per query – a...
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP
Queue
Queuing fragments
• LLAP daemon has a number of exec...
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP Scheduling – pipelining and preemption
• A fragment can ru...
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP Scheduling – pipelining and preemption
• A fragment can ru...
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP Scheduling – pipelining and preemption
• A fragment can ru...
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
IO elevator and other internals
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: IO elevator and other internals
Asynchronous IO and decom...
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Asynchronous IO
• Currently, Hive IO and input
decoding is inte...
Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Asynchronous IO
• With IO elevator, reading,
decoding and proce...
Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Caching and off-heap data
• Decompressed data is cached off-hea...
Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cache size vs operator memory requirement
• Cache space takes a...
Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Other benefits
• File metadata and indexes are cached
• Much fa...
Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Performance
Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Setup
• 13 physical machines (12 cores, 40Gb RAM each)
• Note –...
Page36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
• NOTE - in early stage – pre-alpha-release perf result...
Page37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Query execution time
0
5
10
15
20
25
30
35
query55 query42 quer...
Page38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Parallel query execution
• 8 users, 4 parallel
executors on HS
...
Page39 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Current status and future directions
Page40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Current status
• Putting the finishing touches on the CTP (alph...
Page41 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Work in progress
• Further performance improvement
• Concurrent...
Page42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Future work
• Security, including column level security
• Tight...
Page43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions?
?
Interested? Stop by the Hortonworks booth to learn...
Nächste SlideShare
Wird geladen in …5
×

LLAP: long-lived execution in Hive

13.733 Aufrufe

Veröffentlicht am

Hadoop Summit 2015

Veröffentlicht in: Technologie
  • Slim Down in Just 1 Minute? What if I told you, you've been lied to for nearly all of your life? CLICK HERE TO SEE THE TRUTH  https://tinyurl.com/1minweight4u
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

LLAP: long-lived execution in Hive

  1. 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive Sergey Shelukhin
  2. 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive Stinger recap and even faster queries+ + LLAP: overview+ + Query fragment execution+ + IO elevator and caching+ + Performance+ + Current status and future directions+ + Query fragment API+
  3. 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive performance recap • Stinger: An Open Roadmap to improve Apache Hive’s performance 100x • Delivered in 100% Apache Open Source • Stinger.Next: Enterprise SQL at Hadoop Scale • Launched in September 2014, phase 1 delivered in 2015 Vectorized SQL Engine, Tez Execution Engine, ORC Columnar format Cost Based Optimizer Hive 0.10 Batch Processing 100-150x Query Speedup Hive 0.14 Human Interactive (5 seconds)
  4. 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved The road ahead to sub-second queries • Startup costs are now a key bottleneck • Example: JVM takes 100s of ms to start up • Vectorized code can benefit from JIT optimization • JIT optimizer needs (run)time to do its work • Improved operator performance shifts focus on IO • Reading data is serialized with data processing • Reading from HDFS is relatively expensive • Large machines provide opportunities for data sharing • Both between parallel computation (sharing) and serial (caching)
  5. 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: overview
  6. 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved What is LLAP? • Hybrid execution with daemons in Hive • Eliminates startup costs for tasks • Allows the JIT optimizer to have time to optimize • Multi-threaded execution of vectorized operator pipelines • Also allows sharing of metadata, map join tables, etc. • Asynchronous IO elevator and caching • Reduces IO cost and parallelizes IO and processing • Can be spindle-aware; other IO optimizations • Query fragment API Node LLAP Process Cache Query Fragment HDFS Query Fragment
  7. 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved What LLAP isn't • Not a Hive execution engine (like Tez, MR, Spark…) • Execution engines provide coordination and scheduling • Some work (e.g. large shuffles) can still be scheduled in containers • Not a storage layer • Daemons are stateless and read (and cache) data from HDFS • Does not supersede existing Hive • Container-based execution still fully supported
  8. 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Example execution: MR vs Tez vs Tez+LLAP M M M R R M M R M M R M M R HDFS HDFS HDFS T T T R R R T T T R M M M R R R M M R R HDFS In-Memory columnar cache Map – Reduce Intermediate results in HDFS Tez Optimized Pipeline Tez with LLAP Resident process on Nodes Map tasks read HDFS
  9. 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP in your cluster • LLAP daemons run on existing YARN • Apache Slider is used for provisioning and recovery • Easy to bring up, tear down, and share clusters • Resource management via YARN delegation model (WIP) • LLAP and containers dynamically balance resource usage (WIP)
  10. 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Benefits unrelated to performance (WIP) • Concurrent query execution and priority enforcement • Access control, including column-level security • ACID improvements • Can be used externally via the API • Will be usable e.g. by Spark, Pig, Cascading, …
  11. 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query fragment API
  12. 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Fragment API - overview • Hadoop RPC, protobuf are used to send fragments • Fragments are "physical algebra": operators, metadata, input sources and output channels • Results are returned asynchronously via output channels • Hive will produce fragments for LLAP as part of physical optimization • Other applications can compile their own physical algebra
  13. 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Fragment API – algebra • Operators: Scan, Filter, Group By, Hash/Merge join, etc. • Operators may include statistics for local optimization • Expressions: comparison, arithmetic, Hive built-in functions • All Hive datatypes • Complex types like map/list/etc. – WIP
  14. 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query Fragment API – client API • Encapsulates creation, submission of query fragments • Also helps with IO from LLAP • Getting vectorized record readers, batches, etc. • Working with output channels (cancellation, availability of records, failure)
  15. 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query execution
  16. 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: Query Execution Overview of Query Execution+ + Scheduling+ ++ + Coordination via Tez+ What Fragments run in LLAP vs Containers+ Future work+
  17. 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tez + LLAP – overview • Hive on Tez already proven to perform well • Tez being enhanced to allow it to coordinate work to external systems (TEZ-2003) • Pluggable Scheduling • Pluggable communication – custom execution specifications, protocols • DAG coordination remains unchanged • Hive Operators / Tez Runtime components used for Processing and data transfer
  18. 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Deciding on where query components run • Fragments can run in LLAP, regular containers, AM (as threads) • Decision made by the Hive Client • Configurable – all in LLAP, none in LLAP, intelligent mix • Criteria for running in LLAP (in auto mode) • No user code (or only blessed user code) • Data source – HDFS • ORC and vectorized execution (for now) • Others can still run in LLAP in "all" mode, w/o IO elevator and cache • Data size limitations (avoid heavy / long running processing within LLAP)
  19. 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved So… M M M R R R M M R R Tez
  20. 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved AM So… T T T R R R T T T R M M M R R R M M R R Tez Tez with LLAP (auto) auto
  21. 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved AM AM So… T T T R R R T T T R M M M R R R M M R R Tez Tez with LLAP (auto) T T T R R R T T T R Tez with LLAP (all) allauto
  22. 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Scheduling for LLAP in Tez AM • Greedy scheduling per query – assumes entire cluster available • Schedule work to preferred location (HDFS locality) • Multiple independent queries set the same preferred location if accessing the same data (improves cache locality) • LLAP Daemons schedule fragments independently – across multiple queries
  23. 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Queue Queuing fragments • LLAP daemon has a number of executors (think containers) • Wait queue with pluggable priority • Geared towards low latency queries (default) • Models estimated work left in query • Sequencing within a query handled via topological order • Fragment start time factors into scheduling decision Executor Q1 Reducer 2 Executor Q1 Map 1 Executor Q1 Map 1 Executor Q3 Map 19 Q1 Reducer 2 Q1 Map 1 Q3 Map 19 Q1 Reducer 2
  24. 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce Well, 10 mapper out of 100 are done!
  25. 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready • If the data is not ready, may never free the executor • Non-finishable fragments can be preempted • Improves throughput, prevents deadlocks LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce
  26. 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP Scheduling – pipelining and preemption • A fragment can run when inputs are not yet available (for pipelining) • A fragment is "finishable" if all the source data is ready • If the data is not ready, may never free the executor • Non-finishable fragments can be preempted • Improves throughput, prevents deadlocks LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3
  27. 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved IO elevator and other internals
  28. 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved LLAP: IO elevator and other internals Asynchronous IO and decompression+ + Off-heap data caching+ ++ + File metadata caching+ Map join table sharing+ Better JIT usage thanks to persistent daemon+
  29. 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Asynchronous IO • Currently, Hive IO and input decoding is interleaved with processing • Remote HDFS reads are expensive • Even local disk might be • Data decompression and decoding is expensive
  30. 30. Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Asynchronous IO • With IO elevator, reading, decoding and processing are parallel • IO threads can be spindle aware (WIP) • Depending on workload, IO and processing threads can balance resource usage (throttle IO, etc.) (WIP)
  31. 31. Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Caching and off-heap data • Decompressed data is cached off-heap • Simplifies memory management, mitigates some GC problems • Saves HDFS and decompression costs, esp. on dimension tables • In future, processing cache data directly possible to avoid copies • Replacement policy is pluggable • Currently, simple local policies are used e.g. FIFO, LRFU • Other policies possible (e.g. workflow-adaptable, or lazily coordinated for better cache affinity)
  32. 32. Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cache size vs operator memory requirement • Cache space takes away from operator space • Sort buffers, hash join tables, GBY buffers take space • Tradeoff between HDFS reads and operator speed • Depends on workflow, dataset size, etc. • New vectorization changes in Hive will speed up operators and allow for larger cache
  33. 33. Page33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Other benefits • File metadata and indexes are cached • Much faster PPD application for selective queries – no HDFS reads • Same replacement as data cache (but higher priority) • Map join hash tables, fragment plans are shared • Multiple tasks do not all generate the table or deserialize the plans • Better use of JIT optimizer • Because the daemons are persistent, JIT has more time to kick in • Especially good with vectorization!
  34. 34. Page34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Performance
  35. 35. Page35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Setup • 13 physical machines (12 cores, 40Gb RAM each) • Note – smaller cluster than previous Tez perf runs • TPCDS 200, interactive queries • Both – ORC, vectorized, Hadoop 2.8, queries via HS2 w/JMeter • TEZ: Hive 1.2 + Tez 0.8 (snapshot) • Pre-warm and container reuse enabled • LLAP: Branch in pre-alpha stage + Tez 0.8 (snapshot) • Bias towards executors – small cache • Otherwise no tuning
  36. 36. Page36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Summary • NOTE - in early stage – pre-alpha-release perf results • Still, interactive queries are already 1.5-4 times faster • First query result after launching CLI significantly improved • In real life, LLAP daemons would also already be warm • Parallel queries are already better • Lots of work still ahead – epic locks in Kryo, Log4j, HDFS, HiveServer2; better object sharing, better priority enforcement • Should be much faster in short order
  37. 37. Page37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Query execution time 0 5 10 15 20 25 30 35 query55 query42 query52 query3 query12 query27 query26 query7 query19 query96 query43 query15 query82 query13 Execuonme,sec Hive (1.2.0) Hive (LLAP)
  38. 38. Page38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Parallel query execution • 8 users, 4 parallel executors on HS • Tez: 50% of serial time; LLAP alpha: 41% of serial time 0 50 100 150 200 250 300 Serial Parallel Execuonme,sec Total execu on me (13 queries) Hive (1.2.0) Hive (LLAP)
  39. 39. Page39 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Current status and future directions
  40. 40. Page40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Current status • Putting the finishing touches on the CTP (alpha release) • Watch Hortonworks blog, and Apache Hive mailing lists, for details! • The basic features are functional • Currently only on Tez; IO only on vectorized and ORC • AKA the fastest Hive setup possible  • Lots of performance improvement not yet realized • Lots of advanced features are WIP or planned
  41. 41. Page41 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Work in progress • Further performance improvement • Concurrent query execution improvements • Better vectorized operators (join, group by, …) • Defining the API
  42. 42. Page42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Future work • Security, including column level security • Tighter integration with YARN, e.g. resource delegation • Guaranteed Capacities for better SLA guarantee, maybe with central scheduler • Dynamic daemon sizing with off-heap storage • ACID support • Better (maybe centrally coordinated) locality and caching • Temp tables, intermediate query results in LLAP • Interleaving of Fragment Execution • Past processing is not lost (as against preemption) • A rogue / badly scheduled query will not hog the system
  43. 43. Page43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more

×