February 2014 HUG : Pig On Tez

Pig on Tez
PRESENTED BY

Cheolsoo Park, Netflix
R o h i n i P a l a n i s w a m y , Ya h o o !

The Apache Software Foundation

Apache Pig on Tez Team
Name

Role

Company

Apache Pig Contributor

Linkedin

Cheolsoo Park

VP. Apache Pig

Netflix

Daniel Dai

Apache Pig PMC

Hortonworks

Mark Wagner

Apache Pig Committer

Linkedin

Olga Natkovich

Apache Pig PMC, Pig on Tez Project
Manager

Yahoo!

Rohini Palaniswamy

Apache Pig PMC

Yahoo!

Alex Bain


2

Agenda
 Overview
 Pig and Hive
 Pig on Tez


Why Tez?



Benefits of Tez



Design



Operator DAGs



Performance



Known Issues



Where are we?



What next?


3

Pig Overview
 Apache top-level project for ETL on hadoop.
 PIG Latin - Procedural scripting language that translates sequence of data processing
steps into MapReduce jobs.
 Easy to write, read and reuse and very extensible.
 Feature parity with SQL (FILTER BY, CROSS, JOIN (OUTER, INNER), ORDER BY, LIMIT, RANK, ROLLUP,
CUBE), Custom Loader and Storer, User defined functions (java and non-java), Nested
ForEach, Streaming, macros and much more
PAGEVIEWS = LOAD ‘/data/pageviews’ as (user, url);
GRP = GROUP PAGEVIEWS BY user;
CNT = FOREACH GRP GENERATE group, COUNT(url) as numvisits;
STORE CNT into ‘/data/visited’ using PigStorage(‘,’);


4

Pig and Hive
Pig
Language

Hive

PIG Latin - Procedural

SQL - Declarative

Features

Feature rich. Can easily add new
operators and constructs. For eg:
Nested Foreach, Switch case,
Macros, Scalars.

Limited to SQL operators

Developer code

Load/StoreFunc, Algebraic and
Accumulator java UDFs, non-java
UDFs (jython, python, javascript,
groovy, jruby), Custom Partitioners.

StorageHandler, java UDFs

Complex Processing

Well suited. Multi-query works well
with 1000s of lines of pig script.

Not a good fit

Server

Only client. Can work with Hive
Metastore using
HCatLoader/Storer.

Requires Metastore server
and data has to be registered
in it. HiverServer2 for jdbc


5

Pig and Hive - Continued
Pig
Tez as execution engine

Hive

Planned for 0.14

Planned for Hive 0.13

ORCFile Support

Patch available. Currently through
HCatLoader

From Hive 0.12 onwards.
Huge performance gains

Vectorization

No. May be in future.

Yes. Huge performance gains

Transactions

No

Yes. In works

Cost-based optimizer

No

Yes. In works

JDBC support, Integration with BI
tools

No

Yes. HiveServer2 with
Microstrategy/Tableau

Area of application

Pipeline processing language
standard

Interactive Analytics
/Reporting Platform


6

Why Tez?
 Built on top of YARN



Multi-tenancy (queues, capacity management)
Resource allocation

 DAG execution framework



Natural fit for Pig and Hive than MR as their execution plans are DAGs.
Better than running a DAG of MR jobs passing data in between jobs using HDFS as intermediate store.

 Different types of edges


ONE_ONE, BROADCAST, SHUFFLE

 Flexible Input-Processor-Output runtime model









Custom Vertex Processors. For eg: Map Processor, Reduce Processor, Pig Processor
Custom Inputs. For eg: MRFileInput (input to map), ShuffledMergedInput (input to reduce)
Custom Outputs. For eg: OnFileSortedOutput (output of map), MRFileOutput (output of reduce)

Multiple inputs and outputs
Highly extensible
Security
Support from Tez Community and Hive Community

7

Why Tez? – As a end user





Better Performance
Reduced Resource Usage (Containers/Memory/CPU)
Reduced Network I/O
Reduced Namenode and Datanode load


8

Benefits of Tez
Features

Benefits
•

No intermediate data storage

•
•

•
Single AM for whole DAG


•

Less pressure on Namenode
- Lesser calls for listing and getting block locations
- Smaller namespace usage
- Cuts down on GC
Less pressure on Datanode
- Cuts down on IO in network for both writing and reading.
- Saves space as there are no 3 replicas
Eliminates extra step of map reads from HDFS in every
intermediate job in DAG
- Saves on capacity by eliminating the need for map task
containers
Saves on capacity. For a 5 stage MR job, there would be 5 AM
containers launched.
Eliminates issue of queue and resource contention faced in MR
for jobs started after previous job in DAG completes.
9

Benefits of Tez - Continued
Features

Benefits
•

Container reuse
•

Reduced launch overhead
- Container request and release overhead
- Resource localization overhead
- JVM launch time overhead
Reduced network IO
- Reduce tasks can be launched on same node as Map
- 1-1 edge tasks can be launched on same node

•

Memory structures like small tables used for join can be cached
in jvm and reused for next task on container reuse. Provides
significant performance speedup.

•

Using unsorted input and output where possible saves a lot of
CPU usage and increases performance

•

Saves on capacity. Can have reducers based on data size
instead of having fixed number of reducers.

Vertex caching
Custom inputs and outputs
Dynamic reducer estimation

10

Pig on Tez - Design
Logical Plan
LogToPhyTranslationVisitor
Physical Plan
TezCompiler

MRCompiler

Tez Plan

MR Plan

Tez Execution Engine

MR Execution Engine


11

Pig on Tez – Join
Left
split

Right
split

Left
split

Load L and R

Right
split

l = LOAD ‘left’ AS (x, y);
r = LOAD ‘right’ AS (x, z);
j = JOIN l BY x, r BY x;
Configuration
per input

Configuration
per job

Join


Left
split

Left
split

Right
split

Load L

Load R

Join

12

Right
split

Pig on Tez – Split + Group-by
Load foo
Split multiplex

de-multiplex

Group by y Group by z
HDFS

f = LOAD ‘foo’ AS (x, y, z);
g1 = GROUP f BY y;
g2 = GROUP f BY z;
j = JOIN g1 BY group,
g2 BY group;

Load foo
Multiple outputs

Group by y

HDFS

Group by z

Load g1 and Load g2

Reduce follows
reduce

Join


Join

13

Pig on Tez – Order-by

Aggregate
HDFS

Load &
Sample

f = LOAD ‘foo’ AS (x, y);
o = ORDER f BY x;

Sample

Aggregate

Stage sample map
on distributed cache

Pass through input
via 1-1 edge

Broadcast sample map

Partition

Partition

Sort

Sort


14

Pig on Tez – Skewed join
l = LOAD ‘left’ AS (x, y);
r = LOAD ‘right’ AS (x, z);
j = JOIN l BY x, r BY x
USING ‘skewed’;

Sample L

Load &
Sample

Aggregate
HDFS

Aggregate
Pass through input
via 1-1 edge

Stage sample map
on distributed cache

Broadcast
sample map

Partition L

Partition R

Partition L and Partition R
Join
Join


15

Time in secs

Performance numbers
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0

MR
Tez
Replicated
Join (2.8x)


Join +
Groupby
(1.5x)

Join +
3 way Split +
Groupby +
Join +
Orderby
Groupby +
(1.5x)
Orderby
(2.6x)

16

Factors affecting performance
 Number of stages in the DAG


Higher the number of stages in the DAG, performance of Tez over MR will be better.

 Cluster/queue capacity


More congested a queue is, the performance of Tez over MR will be better due to container reuse.

 Size of intermediate output


More the size of intermediate output, the performance of Tez over MR will be better due to reduced
HDFS usage.

 Size of data in the job


For smaller data and more stages, the performance of Tez over MR will be better as percentage of
launch overhead in the total time is high for smaller jobs.

 Vertex caching


17

Container usage
MR

Tez

Savings

Tez with
container reuse

7563

7562

1

180

Join + Groupby +
Orderby

7655

7603

52

180

Join + Groupby +
Orderby

7663

7609

54

180

3 way Split + Join +
Groupby + Order by

622

563

59

180

Query
Replicated Join

Note. The cluster size was 25 nodes with 180 containers (1.5G each) and Tez reused
them again and again for tasks.

18

Known issues
 Container reuse will have issues when there are


Static variables in LoadFunc, StoreFunc, UDFs



Memory leaks in LoadFunc, StoreFunc, UDFs

 With single DAG execution of whole script, AM retries can be very costly until
Tez supports checkpointing and resuming.


19

Where are we?
 Major operators



Split, Union



Group-by, Distinct, Limit



Order-by










Load, Store, Filter-by, Foreach

Hash join, Replicated join, Skewed join, Merge join

UDFs (Java and non-Java)
Streaming
Multi-query on and off
Macros
Scalars
95% of e2e tests pass for finished features.

20

What next?
 Feature Parity with MR




Local mode
Port all unit and e2e tests
Support for remaining Operators




CROSS, RANK, CUBE, ROLLUP

Support for Native Mapreduce (Low priority)

 Merge tez branch with trunk
 Stability


Handling failures
 Testing and tuning for large data and DAGs with > 10 stages

 Usability


Counters
 Progress Information
 Log information and debuggability

21

What next? – Performance Improvements
›

Dynamic Reducer Estimation

›

Better memory management

›

Calculate input splits in AM and let Tez do combining of input splits for
pig.maxCombinedSplitSize

›

Vertex Grouping to write data directly into one output directory from multiple vertices in
case of union

›

Using unsorted shuffle in Union, Orderby, Skewed Join, etc to improve performance.

›

Shared Edges for multiple outputs if same data has to go to multiple downstream
vertices. For eg: Multi-query off, skewed join sample aggregation output.

›

HDFS Caching


22

C ontri butors Wel come


Pi g User Group Meetup at Li nkedIn
14 th March 2014


Questi ons ???


February 2014 HUG : Pig On Tez

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie February 2014 HUG : Pig On Tez

Ähnlich wie February 2014 HUG : Pig On Tez (20)

Mehr von Yahoo Developer Network

Mehr von Yahoo Developer Network (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

February 2014 HUG : Pig On Tez

Hinweis der Redaktion