April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster

Stinger Tech Deep
Dive
Page 1
Alan Gates, Arun Murthy, Owen O’Malley

Hadoop Summit
Page 2Architecting the Future of Big Data
•  June 26-27, 2013- San Jose Convention Cntr
•  Co-hosted by Hortonworks & Yahoo!
•  Theme: Enabling the Next Generation
Enterprise Data Platform
•  90+ Sessions and 7 Tracks:
•  Community Focused Event
–  Sessions selected by a Conference Committee
–  Community Choice allowed public to vote for
sessions they want to see
•  Training classes offered pre event
–  Apache Hadoop Essentials: A Technical
Understanding for Business Users
–  Understanding Microsoft HDInsight and Apache
Hadoop
–  Developing Solutions with Apache Hadoop –
HDFS and MapReduce
–  Applying Data Science using Apache Hadoop
•  HUG member discount 10%
Use code: 13DiscHUG10
hadoopsummit.org

Stinger Deep Dive Overview
Page 3
© 2013 Hortonworks, DO NOT SHARE, HORTONWORKS
CONFIDENTIAL
•  Hive Performance Near Term
•  Support for Analytics
•  ORCFile
•  Tez
•  Hive Performance Longer Term

Hive Performance Near Term
Page 4
CONFIDENTIAL
•  Enable star joins
–  Where possible do in single map only task
–  When not possible push larger tables to separate tasks
•  Collapse adjacent jobs where possible
–  Hive has lots of M->MR type plans, collapse these to MR
–  Collapse adjacent jobs on sufficiently similar keys when feasible
–  join followed by group
–  join followed by order
–  group followed by order
–  Much of this ongoing in the community, we are shepherding
•  Remove client side hash table generation
–  Need to prove this is worth the effort short term

© Hortonworks Inc. 2013
TL;DR
• TPC-DS Query 27, Scale=200, 10 EC2 nodes (40 disks)
1501.895
1176.479
631.027
39.985
0
200
400
600
800
1000
1200
1400
1600
Query 27
Text
RCFile
Partitioned RCFile
Partitioned RCFile + Optimizations

TL;DR - II
• TPC-DS Query 82, Scale=200, 10 EC2 nodes (40 disks)
3257.692
2862.669
255.641
71.114
0
500
1000
1500
2000
2500
3000
3500
Query 82
Text
RCFile
Partitioned RCFile
Partitioned RCFile + Optimizations

Before

After

Support for Analytics - OVER
Page 9
CONFIDENTIAL
•  OVER clause
–  PARTITION BY, ORDER BY, ROWS BETWEEN/FOLLOWING/PRECEDING
–  Will work with current aggregate functions
–  Adding new aggregates/window functions we believe are useful
–  Condor: RANK, LEAD, ROW_NUMBER, LAG, LEAD, FIRST_VALUE, LAST_VALUE
–  Ducati: NTILE, DENSE_RANK, CUME_DIST, PERCENT_RANK, PERCENT_CONT,
PERCENT_DISC
–  Ongoing work in the community, will be shepherding it
SELECT salesperson, AVG(salesprice) OVER
(PARTITION BY region, ORDER BY date
RANGE 10 ROWS PRECEEDING)
FROM sales;

Support for Analytics Continued
Page 10
CONFIDENTIAL
•  CUBE BY, ROLLUP in Hive 0.10
•  Subqueries in WHERE
–  Non-correlated only
–  [NOT] IN supported
–  Plan to optimize to fit in memory as hash table when feasible, join when not
•  Datatypes
–  datetime
–  char() and varchar()
–  add precision and scale to decimal and float
–  aliases for standard SQL types (BLOB = binary, CLOB = string, integer = int, real/
number = decimal)

ORCFile – Optimized RCFile
Page 11
CONFIDENTIAL

Top Level
Page 12

File Structure
Page 13

Stripe Structure
Page 14

File Layout
Page 15

Integer Column Serialization
Page 16

String Column Serialization
Page 17

Compression
Page 18

Projection and Predicate Filtering
Page 19

Example File Sizes
Page 20

Final notes
Page 21

Comparison
Page 22
RC File Trevni ORC File
Hive Type Model N N Y
Separate complex columns N Y Y
Splits found quickly N Y Y
Default column group size 4MB 64MB* 250MB
Files per a bucket 1 > 1 1
Store min, max, sum, count N N Y
Versioned metadata N Y Y
Run length data encoding N N Y
Store strings in dictionary N N Y
Store row count N Y Y
Skip compressed blocks N N Y
Store internal indexes N N Y

Page 23
Tez

© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Tez (fka MRX)
• Low level data-processing execution engine
• Use it for the base of MapReduce, Hive, and Pig
• Enables pipelining of jobs
• Removes task and job launch times
• Hive and Pig jobs no longer need to move to the end
of the queue between steps in the pipeline
• Does not write intermediate output to HDFS
– Much lighter disk and network usage
• Built on YARN
Page 24

Tez- Core Idea
Task with pluggable Input, Processor & Output
Page 25
YARN ApplicationMaster to run DAG of Tez Tasks
Input Processor
Tez Task
Output

Tez – Blocks for building tasks
MapReduce ‘Map’
Page 26
MapReduce ‘Reduce’
HDFS
Input
Map
Processor
Tez Task
Sorted
Output
Shuffle
Input
Reduce
Processor
Tez Task
HDFS
Output
Intermediate ‘Reduce’
for
Map-Reduce-Reduce
Shuffle
Input
Reduce
Processor
Tez Task
Sorted
Output

Tez – More tasks
Special Pig/Hive ‘Map’
Page 27
In-memory Map
HDFS
Input
Map
Processor
Tez Task
Pipeline
Sorter
Output
HDFSIn
put
Map
Processor
Tez Task
In-
memory
Sorted
Output
Special Pig/Hive
‘Reduce’
Shuffle
Skip-
merge
Input
Reduce
Processor
Tez Task
Sorted
Output

Hive/MR versus Hive/Tez
Page 28
I/O Synchronization
Barrier
I/O Pipelining
Hive - MR Hive - Tez
SELECT a.state, COUNT(*)
FROM a JOIN b ON (a.id = b.id)
GROUP BY a.state

Hive/MR versus Hive/Tez
Page 29
Hive - MR Hive - Tez
SELECT a.state, COUNT(*), AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a.itemId = c.itemId)
GROUP BY a.state

FastQuery: Beyond Batch with YARN
Page 30
Tez Generalizes Map-Reduce
Simplified execution plans process
data more efficiently
Always-On Tez Service
Low latency processing for
all Hadoop data processing

Tez Service
• Hive Query Startup Expensive
– Job launch & task-launch latencies are fatal for short queries (in
order of 5s to 30s)
• Solution
– Tez Service
– Removes task-launch overhead
– Removes job-launch overhead
– Hive
– Submit query-plan to Tez Service
– Native Hadoop service, not ad-hoc
Page 31

More details
• Tez initiation
– MRX runtime for running existing MR applications
• Tez Improvements
– In-memory sort/shuffle
– Specialized sorts for Hive (integer sorts etc.)
– Specialized merges for Hive (shallow key-space merges)
• Tez on MRX
– Change Hive to use MR+ from Tez
– Focus on map-side joins
• Initiated Apache Incubator Project
Page 32

Hive Performance Longer Term -
Vectorization
Page 33
CONFIDENTIAL
•  Rewrite operators to work on arrays of Java scalars
•  aka vectorization – ala MonetDB
•  Operates on blocks of 1K or more records
•  Each block contains an array of Java scalars, one for each column
•  Avoids many function calls
•  Size to fit in L1 cache, avoid cache misses
•  Generate code for operators on the fly to avoid branches in code, maximize
deep pipelines of modern processers
•  Will require complete rewrite of operators and execution pipeline
–  Need to figure out how to integrate this into Hive, separate branch, separate layer
in another project, or integrate in existing pipeline
•  Requires conversion of all column values to Java scalars – no objects allowed
–  Integrates nicely with up coming ORC work, other input types will need conversion
on reading
•  Want to write this in a way it can be shared by Pig, Cascading, MR
programmers

CPU intensive code

April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster

Ähnlich wie April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster (20)

Mehr von Yahoo Developer Network

Mehr von Yahoo Developer Network (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

April 2013 HUG: The Stinger Initiative - Making Apache Hive 100 Times Faster