Weitere ähnliche Inhalte
Ähnlich wie Apache Tez – Present and Future (20)
Mehr von DataWorks Summit (20)
Kürzlich hochgeladen (20)
Apache Tez – Present and Future
- 1. © Hortonworks Inc. 2015 Page 1
Apache Tez – Present and Future
Jeff Zhang
Rajesh Balamohan
- 2. © Hortonworks Inc. 2015
Outline
•Tez Introduction
•Tez Feature Deep Dive
•Tez Improvement & Debuggability
•Tez Status & Roadmap
- 3. © Hortonworks Inc. 2015
I/O Synchronization
Barrier
I/O Synchronization
Barrier
Job 1 ( Join a & b )
Job 3 ( Group by of c )
Job 2 (Group by of
a Join b)
Job 4 (Join of S & R )
Hive - MR
Example of MR versus Tez
Page 3
Single Job
Hive - Tez
Join a & b
Group by of a Join b
Group by of c
Job 4 (Join of S & R )
- 4. © Hortonworks Inc. 2015
Tez – Introduction
Page 4
• Distributed execution framework
targeted towards data-processing
applications.
• Based on expressing a computation
as a dataflow graph (DAG).
• Highly customizable to meet a broad
spectrum of use cases.
• Built on top of YARN – the resource
management framework for
Hadoop.
• Open source Apache project and
Apache licensed.
- 5. © Hortonworks Inc. 2015
What is DAG & Why DAG
Projection
Filter
GroupBy
…
Join
Union
Intersect
…
Split
…
• Directed Acyclic Graph
• Any complicated DAG can been composed of the following 3 basic
paradigm of DAG
– Sequential
– Merge
– Divide
- 6. © Hortonworks Inc. 2015
Anatomy of DAG
Logic View
Vertex
Task_1 Task_2 Task_3
Runtime View
Task
TaskAttempt_1 TaskAttempt_2
- 7. © Hortonworks Inc. 2015
Expressing DAG in Tez
• Logic View (DAG API )
–Allow user to express computation by using DAG
–Topological structure of the data computation flow
• Runtime API (I/P/O)
–Application logic of each computation unit
–How to move/read/write data between vertices
- 8. © Hortonworks Inc. 2015
Logic View (DAG API)
Page 8
• Vertex (Processor, Parallelism, Resource, etc…)
• Edge (EdgeProperty)
–DataMovement
– ScatterGather (Join, GroupBy … )
– BroadCast ( Pig Replicated Join / Hive Broadcast Join )
– One2One ( Pig Order by )
– Custom
- 9. © Hortonworks Inc. 2015
Runtime View (Runtime API)
Page 9
ProcessorInput Output
• Input
– Through which processor receives data on an edge
– One vertex can have multiple inputs
• Processor
– Application Logic (One vertex one processor)
– Consume the inputs and produce the outputs
• Output
– Through which processor write data to an edge
– One vertex can have multiple outputs
• Example of Input/Output/Processor
– MRInput & MROutput (InputFormat/OutptFormat)
– OrderedGroupedKVInput & OrderedPartitionedKVOutput (ScatterGather)
– UnorderedKVInput & UnorderedKVOutput (Broadcast & 1-1)
– PigProcessor/HiveProcessor
- 10. © Hortonworks Inc. 2015
Benefit of DAG
• Easier to express computation in DAG
• No intermediate data written to HDFS
• Less pressure on NameNode
• No resource queuing effort & less resource contention
• More optimization opportunity with more global context
- 11. © Hortonworks Inc. 2015
Outline
•Tez Introduction
•Tez Feature Deep Dive
•Tez Improvement & Debuggability
•Tez Status & Roadmap
- 12. © Hortonworks Inc. 2015
Container-Reuse
• Reuse the same container across DAG/Vertices/Tasks
• Benefit of Container-Reuse
–Reduce overhead of launching JVM
–Reduce overhead of negotiate with Resource Manager
–Reduce overhead of resource localization
–Reduced network IO
–Less resources consumed
–Object Caching
- 13. © Hortonworks Inc. 2015
Tez Session
• Multiple Jobs/DAGs in one AM
• Container-reuse across Jobs/DAGs
• Share data between Jobs/DAGs
- 14. © Hortonworks Inc. 2015
Dynamic Parallelism Estimation
• VertexManager
–Listen to the other vertices
status
–Coordinate and schedule its
tasks
–Communication between
vertices
- 15. © Hortonworks Inc. 2015
ATS Integration
• Tez is fully integrated with YARN ATS (Application Timeline
Service)
–DAG Status, DAG Metrics, Task Status, Task Metrics are captured
• Diagnostics & Performance analysis
–Data Source for monitoring & diagnostics
–Data Source for performance analysis
- 16. © Hortonworks Inc. 2015
Recovery
• AM can crash in corner cases
–OOM
–Node failure
–…
• Continue from the last checkpoint
• Transparent to end users
AM Crash
- 17. © Hortonworks Inc. 2015
Order By of Pig
f = Load ‘foo’ as (x, y);
o = Order f by x;Sample
Aggregate
(Calculate Histogram)
HDFS
Partition
Sort
Broadcast
Load &
Sample
Aggregate
(Calculate Histogram)
Partition
Sort
1-1
ScatterGather
ScatterGather
- 18. © Hortonworks Inc. 2015
Outline
•Tez Introduction
•Tez Feature Deep Dive
•Tez Improvement & Debuggability
•Tez Status & Roadmap
- 19. © Hortonworks Inc. 2015
• Performance
–Speculation
–Better use of JVM Memory
–Intermediate File Improvements
–Shuffle Improvements
• Debuggability
–Job Analysis Tools
–Shuffle Performance Analysis Tool
–Local Mode
–Tez UI
- 20. © Hortonworks Inc. 2015
Speculation
• Maintains Periodic Runtime Statistics of Tasks
• Similar to Legacy MR speculation
–Trigger speculative attempt when estimated runtime > mean runtime
• Good for Cluster Having Good & Slow Nodes.
• <TBD>
- 21. © Hortonworks Inc. 2015
Intermediate File Format Improvements
• Key value format used for storing
intermediate format in Tez
• Drawbacks of earlier format
–Needs larger buffer in memory (due to
duplicate keys)
–Unwanted key comparison of identical
keys during merge sort
–Bigger file size in disk
–Not ideal for all use cases
• New Intermediate File Format
–Works based on (K, List<V>)
–Lesser key comparisons during merge
sort
–Provides 57% memory efficiency and
23% improvement in disk storage
Task
Spill 1 Spill 2 Spill 3
Merged Spill
Key
Len
Value Len Key Bytes Value Bytes
Key
Len
Value Len Key Bytes Value Bytes
Key
Len
Value Len Key Bytes Value Bytes
Key
Len
Value Len Key Bytes Value Bytes
………………………
EOF Marker
Key
Len
Key Bytes Value Len Value Bytes V_END
Key
Len
Key Bytes Value Len Value Bytes V_END
Key
Len
Key Bytes Value Len Value Bytes V_END
Key
Len
Key Bytes Value Len Value Bytes V_END
EOF Marker
………………………
…
…
…
Old IFile Format
New IFile Format
RLE
RLE
RLE
- 22. © Hortonworks Inc. 2015
Better use of JVM Memory
• PipelinedSorter can support > 2 GB sort buffers
–Containers with higher RAM no longer limited by 2 GB sort buffer limits
–Avoids unnecessary spills
• Reduced key comparison costs in PipelinedSorter
• <TBD>
- 23. © Hortonworks Inc. 2015
Better use of JVM Memory - Contd
• BytesWritable Improvements
–Provides FastByteSerialization
–Saves 8 bytes per key-value pair
–Reduces IFile size by 25%
–Reduces SERDE costs
• WeightbedMemoryDistributor
for better memory management
in tasks
–Observed 26% runtime
improvements
• Enabled RLE in reducer codepath
–Improved Job Runtime
–Reduced IO Cost
• <TBD>
- 24. © Hortonworks Inc. 2015
Source Task
….
….
Broadcast Shuffle Improvements
Task 1
Task 2
Task N
…
Task 1
Task 2
Task N
…
Task 1
Task 2
Task N
…
Broadcast
From local disk
From local disk
Source Task
….
….
Task 1
Task 2
Task N
…
Task 1
Task 2
Task N
…
Task 1
Task 2
Task N
…
Broadcast
Before Fix After Fix
- 25. © Hortonworks Inc. 2015
PipelinedShuffle Improvments
• Final merge in source
task is avoided.
– Less IO
• Consumers are
informed about spill
events in advance
– Better usage of
network bandwidth
– Overlap CPU with
network
– For sorted/unsorted
outputs, send data to
consumers in chunks
• Observed 20% runtime
improvement in
queries involving heavy
skews
Task 1
Spill 1
Task 2
Reduce Task 1 Reduce Task 1Reduce Task 1Reduce Task 1Reduce Task N
…..
…..
…..
…..
Spill 1 Spill 2 Spill 3
Task 1
Spill 1
Task 2
Spill 1 Spill 2 Spill 3
Reduce Task 1 Reduce Task 1Reduce Task 1Reduce Task 1Reduce Task N
…..
…..
…..
…..
Merged Spill
Normal Shuffle Path
Pipelined Shuffle Path
- 26. © Hortonworks Inc. 2015
Job Analysis Tools
• DAG Swimlane
–“$TEZ_HOME/tez-tools/swimlanes/sh yarn-swimlanes.sh <app_id>”
Prewarm
Container Reuse
Remote Reads
- 27. © Hortonworks Inc. 2015
Shuffle Performance Analysis Tools
• Analyze shuffle performance between source / destination
nodes
- 28. © Hortonworks Inc. 2015
Shuffle Performance Analysis Tools
• Analyze shuffle performance between source / destination
nodes
- 29. © Hortonworks Inc. 2015
Better Debuggability– Local Mode
• Test Tez Jobs without Hadoop Cluster
• Enables Fast Prototyping
• Fast Unit Testing
• Runs on Single JVM (easy for debugging)
• Scheduling / RPC invocations Skipped
- 34. © Hortonworks Inc. 2015
RoadMap
• Shared Output Edges
• Multiple Edges between Vertices
• Local Mode Stabilization
• Optimizing (include/exclude) Vertex at Runtime
• <TBD>
- 35. © Hortonworks Inc. 2015
Tez – Adoption
• Apache Hive
• Start from Hive 0.13
• set hive.exec.engine = tez
• Apache Pig
• Start from Pig 0.14
• pig -x tez
• Cascading
• Cascading 3.0 WIP
Page 35
Hinweis der Redaktion
- Hive has written it’s own processor