Weitere ähnliche Inhalte Ähnlich wie Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014 (20) Mehr von Modern Data Stack France (20) Kürzlich hochgeladen (20) Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/20141. Apache Tez: Accelerating Hadoop Query
Processing
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks. We do Hadoop.
2. Who am I ?
Olivier Renault – orenault@hortonworks.com
Solution engineer – Hortonworks EMEA
Hadoop specialist:
- platform
- security
- tuning
Trying to tame the elephant !
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
3. Tez – Introduction
Distributed execution framework
targeted towards data-processing
applications.
Based on expressing a
computation as a dataflow graph.
Highly customizable to meet a
broad spectrum of use cases.
Built on top of YARN – the
resource management framework
for Hadoop.
Open source Apache project and
Apache licensed.
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4. Hadoop 1 -> Hadoop 2
HADOOP 1.0
Hive
(sql)
MapReduce
Pig
(data flow)
Others
(cascading)
(cluster resource management
& data processing)
HDFS
(redundant, reliable storage)
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HADOOP 2.0
YARN
Tez
(execution engine)
(cluster resource management)
HDFS2
(redundant, reliable storage)
Data Flow
Pig
SQL
Hive
Others
(cascading)
Batch
MapReduce Real Time
Stream
Processing
Storm
Online
Data
Processing
HBase,
Accumulo
Monolithic
- Resource management
- Execution Engine
- User API
Layered
- Resource Management – YARN
- Execution Engine – Tez
- User API – Hive, Pig, Cascading, …
5. Tez – Empowering Applications
Tez solves hard problem of running on a distributed Hadoop environment
Apps can focus on solving their domain specific problems
This design is important to be a platform for a variety of applications
App - Custom application logic
- Custom data format
- Custom data transfer technology
Tez - Distributed parallel execution
- Negotiating resources from the hadoop framework
- Fault tolerance and recovery
- Horizontal scalability
- Resource elasticity
- Shared library of ready-to-use components
- Built-in performance optimizations
- Security
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
6. Tez – End User Benefits
Better performance of application
- Built-in performance + Application define optimizations
Better predictability of results
- Minimization of overheads and queuing delays
Better utilization of compute capacity
- Efficient use of allocated resources
Reduced load on distributed filesystem (HDFS)
- Reduce unnecessary replicated writes
Reduced network usage
- Better locality and data transfer using new data patterns
Higher application developer productivity
- Focus on application business logic rather than Hadoop internals
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
7. Tez – Design considerations
Leverage discrete task based compute model for elasticity, scalability and
fault tolerance
Leverage several man years of work in Hadoop Map Reduce data shuffle
operations
Leverage proven resource sharing and multi-tenancy model for Hadoop
and YARN
Leverage built-in security mechanism in Hadoop for privacy and isolation
Look to the Future with an eye on the Past
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
8. Tez – Problems that it addresses
Expressing the computation
- Direct and elegant representation of the data processing flow
- Interfacing with application code and new technologies
Performance
- Late binding: Make decisions as late as possible
- Leverage the resources of the cluster efficiently
- Just work out of the box
- Customizable engine to let applications tailor the job to meet their specific requirements
Operation simplicity
- Painless to operate, experiment and upgrade
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
9. Tez – Simplifying Operations
No deployments to do. No side effects. Easy and safe to try it out!
- Tez is a completely client side application.
- Simply upload to any accessible FileSystem and change local Tez configuration to point to
that.
- Enables running different versions concurrently. Easy to test new functionality while keeping
stable versions for production.
- Leverages YARN local resources.
TezClient TezTask
Client
Machine
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
TezTask
Node
Manager
Node
Manager
HDFS
Tez Lib 1 Tez Lib 2
TezClient
Client
Machine
10. Tez – Expressing the computation
Distributed data processing job typically look like DAGs ( Direct Acyclic
Graph)
- Vertices in the graph represent data transformation
- Edges represent data movement from producers to consumers
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Preprocessor Stage
Partition Stage
Aggregate Stage
Sampler
Task-1 Task-2
Task-1 Task-2
Task-1 Task-2
Samples
Ranges
Distributed sort
11. Tez – Expressing the computation
Tez provides the following APIs to define the processing
DAG API
- Defines the structure of the data processing and the relationship between producers and
consumers
- Enable definition of complex data flow pipelines using simple graph connection API’s. Tez
expands the logical DAG at runtime
- Specify all the tasks in the job
Runtime API
- Defines how the framework and app code interact with each other
- App code transforms data and moves it between tasks
- Specify what actually executes in each task on the cluster nodes
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
12. Tez – Deep Dive – API
// Define DAG
DAG dag = new DAG();
// Define Vertex
Vertex map1 = new Vertex(MapProcessor.class);
Vertex reduce1 = new Vertex(ReduceProcessor.class);
// Define Edge
Edge edge1 = Edge(map1, reduce1, SCATTER_GATHER,
PERSISTED, SEQUENTIAL, MOutput.class, RInput.class);
// Connect them
dag.addVertex(map1).addVertex(map2).addEdge(edge1)…
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
reduce1
map2
reduce2
join1
map1
Scatter_Gather
Bipartite Sequential
Scatter_Gather
Bipartite Sequential
Simple DAG definition API
13. Tez – Deep Dive – API
Edge properties define the connection between producer and consumer
vertices in the DAG
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Page 14
• Data movement – Defines routing of data between tasks
– One-To-One: Data from the ith producer task routes to the ith consumer task.
– Broadcast: Data from a producer task routes to all consumer tasks.
– Scatter-Gather: Producer tasks scatter data into shards and consumer tasks gather the data. The ith
shard from all producer tasks routes to the ith consumer task.
– Custom: Define your own
• Scheduling – Defines when a consumer task is scheduled
– Sequential: Consumer task may be scheduled after a producer task completes.
– Concurrent: Consumer task must be co-scheduled with a producer task.
• Data source – Defines the lifetime/reliability of a task output
– Persisted: Output will be available after the task exits. Output may be lost later on.
– Persisted-Reliable: Output is reliably stored and will always be available
– Ephemeral: Output is available only while the producer task is running
14. Tez – Logical DAG expansion at Runtime
map1 map1 map2
reduce1
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
map2
reduce2
join1
Red1 Red2
Join1
15. Tez – Runtime API
Flexible Inputs-Processors-Outputs Model
- Thin API to wrap around arbitrary application code
- Compose inputs, processor and outputs to execute arbitrary procesing
- Event routing based control plane architecture
- Applications decide logical data format and data transfer technology
- Customize for performance
- Built-in implementation for Hadoop 2.0 data services – HDFS and YARN ShuffleService
Input Processor Output
initialize(tezInputContext ctxt) initialize(tezProcessorContext ctxt) initialize(tezOutputContext ctxt)
reader getReader() num(List<input> inputs,
List<output> outputs)
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
writer getWriter()
handleEvents(list <event> evts) handleEvents(list <event> evts) handleEvents(list <event> evts)
close() close() close()
16. Tez: Library of Inputs and Outputs
Classical ‘Map’ Classical ‘Reduce’
HDFS Input Map Processor
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Intermediate ‘Reduce’ for
Map-Reduce-Reduce
Sorted
Output
Reduce
Processor
Shuffle
Input
HDFS
Output
Reduce
Processor
Shuffle
Input
Sorted
Output
What is build in ?
–Hadoop InputFormat / OutputFormat
–SortedGroupPartitioned Key-Value Input /
Output
–UnsortedGroupedPartitioned Key-Value
Input / Output
–Key-Value Input / Output
17. Tez - Performance
Benefits of expressing the data processing as a DAG
- Reducing overheads and queuing effects
- Gives system the global pictures for better planning
Efficient use of resources
- Re-use resources to maximize utilisation
- Pre-Launch, pre-warm and cache
- Locality & resource aware scheduling
Support for application defined DAG modification at runtime
- Change task concurrency
- Change task scheduling
- Change DAG edges
- Change DAG Vertices
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
18. Tez – Benefits of DAG execution
Faster Execution and Higher predicabliity
– Eliminate replicated write barrier between successive computations.
–Eliminate job launch overhead of workflow jobs.
– Eliminate extra stage of map reads in every workflow job.
– Eliminate queue and resource contention suffered by workflow jobs that are started after a
predecessor job completes.
–Better locality because the engine has got the overall picture
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Page 19
Pig/Hive - MR
Pig/Hive - Tez
19. Hive-on-MR vs. Hive-on-Tez
SELECT a.x, AVERAGE(b.y) AS avg
FROM a JOIN b ON (a.id = b.id) GROUP BY a
UNION SELECT x, AVERAGE(y) AS AVG
FROM c GROUP BY x
ORDER BY AVG;
Hive – MR Hive – Tez
SELECT a.state
JOIN (a, c)
SELECT c.price
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
SELECT b.id
JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
Tez avoids unneeded
writes to HDFS
M M
R
R
SELECT a.state,
c.itemId
JOIN (a, c)
JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)
SELECT b.id
20. Tez – Container Re-Use
- Reuse YARN containers/JVMs to launch new tasks
- Reduce scheduling and launching delays
- Shared JVM objects across tasks.
- JVM JIT Friendly execution
Tez
Application Master
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
© Hortonworks Inc. 2013
Page 21
TezTask Host
TezTask1
TezTask2
YARN Container
Shared Objects
YARN Container
Start Task
Task Done
Start Task
21. Tez - Sessions
Sessions
- Standard concepts of pre-launch and pre-warm
applied
- Key for interactive queries
- Represents a connection between the
user and the cluster
- Multiple DAGs executed in the same
session
- Container re-used across queries
- Takes care of data locality and releasing
resources when idle
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Client
Application Master
Task Scheduler
Start
Session
Submit
DAG
Pre
Warmed
JVM
Shared
Object
Registry
Container Pool
22. Tez – Deep Dive – Scheduling
Vertex-1
Vertex-2
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Start
vertex
Vertex Manager
Start
tasks
DAG
Scheduler
Get Priority
Get Priority
Start
vertex
Task
Scheduler
Get container
Get container
Vertex Manager
• Determines task
parallelism
• Determines when tasks
in a vertex can start
DAG Scheduler
• Determines priority of
task
Task Scheduler
• Allocates containers from
YARN and assigns them
to tasks
23. Tez – Event Based Control Plane
Events used to communicate between the tasks and between task and
framework
Data Movement Event used by producer task to inform the consumer
tasks about data location, size, etc..
Input Error event sent by task to the engine to inform about errors in
reading input. The engine then takes action by re-generating the input
Other events to send task completion notification, data statistic and other
control plane information
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
24. Tez – Automatic Reduce Parallelism
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Map Vertex
Reduce Vertex
Vertex Manager
Set Parallelism
App Master
Data Size Statistics
Vertex State
Machine
Cancel Task
Re-Route
Event Model
Map tasks send data
statistics events to the
Reduce Vertex Manager.
Vertex Manager
Pluggable user logic that
understands the data
statistics and can formulate
the correct parallelism.
Advises vertex controller on
parallelism
26. Tez – Performance
30TB Scale factor – Hive 10 RC File, Hive 13 ORC
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
27. Tez – Observations on Performance
Number of stage in the DAG
- High number of stages in the DAG
Cluster / Queue capacity
- Congested queue - container re-use
Size of intermediate output
- Large size of intermediate output – less HDFS usage
Size of data in the job
- Small data and lot of stages – Less overhead than MR
Offload work to the cluster
- Use DAG – utilize parallelism and resources of the cluster
Vertex caching
- Reduce re-computation
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
28. Tez – Adoption Path
Pre-requisite: Hadoop 2 with YARN
Simple client-side install ( no admin support needed )
- Need a folder with write permission HDFS
- No side effects or traces left behind on your cluster
Apache Hive – Available in 0.13
- Set “hive.execution.engine” to ”tez”
Apache Pig – Available in 0.14
Cascading – Version 3.0
Run your MapReduce jobs using Tez runtime
- Set “mapreduce.framework.name” to “yarn-tez”
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
29. Tez - Roadmap
Richer DAG support
- Addition of vertices at runtime
- Shared edges for shared outputs
- Enhance Input / Output library
Performance optimizations
- Improve support for high concurrency
- Improve locality aware scheduling
- Add framework level data statistics
- HDFS memory storage integration
Usability
- Tez UI
- API ease of use
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hinweis der Redaktion Look to the future with an eye on the past
Re-use our learning Expressing the computation
MapReduce – sometime hard to express the algo
Able to change the source / sink of data – advance source RDBMs, RMA, …
Performance
Late binding = using cluster information from runtime real data at runtime
Bulid in the framework a sol for user to change their mechanism
Really simple / really safe to deploy User - Define a workflow as a DAG
Define – movement data from the source to sink via a set of consumer and producer
Vertex – transformtion of data – transform, filter, compute, …
Edge – movement of data - could be writing to local disk, HDFS, streaming to one place to another, DB, …
Preprocessor stage – eg text data – send to sampler – calculating ranges for splitting data in partition a – c, d – f, …
Partition to aggregate stage = Scater / Gather DAG: Let you define graph
Runtime : run code of customer Logical plan Logical plan Physical DAG Divide a Task is a triplet – Input / Processor / Output
Input – Read data -> transform data from source data to an input that Processor can understand
Processor -> Biz logic
Output – Write Data
You can simply swap any part, if you change You could switch input or output – For eg, swap HDFS output for reducer to Kafka Q or RDBMS DB
Input could be an in memory DB for perf
Input built in
HDFS / YARN shuffle service – read data from local disk Queuing effect on busy cluster
If you’ve define – 100 reducer, it can be shred down auto – ( reduces resources )
Enable the deep dive scheduling
Send statistic where are task launch – eg colocation