More Related Content Similar to February 2014 HUG : Hive On Tez (20) More from Yahoo Developer Network (20) February 2014 HUG : Hive On Tez2. Batch AND Interactive SQL-IN-Hadoop
Stinger Initiative
A broad, community-based effort to
drive the next generation of HIVE
Stinger Project
(announced February 2013)
Hive 0.11, May 2013:
• Base Optimizations
• SQL Analytic Functions
• ORCFile, Modern File Format
Goals:
Speed
Improve Hive query performance by 100X to
allow for interactive query times (seconds)
Scale
The only SQL interface to Hadoop designed
for queries that scale from TB to PB
Hive 0.12, October 2013:
•
•
•
•
VARCHAR, DATE Types
ORCFile predicate pushdown
Advanced Optimizations
Performance Boosts via YARN
Coming Soon:
SQL
Support broadest range of SQL semantics for
analytic applications running against Hadoop
…all IN Hadoop
© Hortonworks Inc. 2013.
•
•
•
•
•
Hive on Apache Tez
Query Service
Buffer Cache
Cost Based Optimizer (Optiq)
Vectorized Processing
3. SQL: Enhancing SQL Semantics
Hive SQL Datatypes
Hive SQL Semantics
SQL Compliance
INT
SELECT, INSERT
TINYINT/SMALLINT/BIGINT
GROUP BY, ORDER BY, SORT BY
BOOLEAN
JOIN on explicit join key
FLOAT
Inner, outer, cross and semi joins
DOUBLE
Sub-queries in FROM clause
Hive 12 provides a wide
array of SQL data types
and semantics so your
existing tools integrate
more seamlessly with
Hadoop
STRING
ROLLUP and CUBE
TIMESTAMP
UNION
BINARY
Windowing Functions (OVER, RANK, etc)
DECIMAL
Custom Java UDFs
ARRAY, MAP, STRUCT, UNION
Standard Aggregation (SUM, AVG, etc.)
DATE
Advanced UDFs (ngram, Xpath, URL)
VARCHAR
Sub-queries in WHERE, HAVING
CHAR
Expanded JOIN Syntax
SQL Compliant Security (GRANT, etc.)
INSERT/UPDATE/DELETE (ACID)
© Hortonworks Inc. 2013.
Available
Hive 0.12
Roadmap
4. Stinger: Hive performance
Feature
Description
Benefit
Tez Integration
Tez is significantly better engine than MapReduce
Latency
Vectorized Query
Take advantage of modern hardware by processing
thousand-row blocks rather than row-at-a-time.
Throughput
Query Planner
ORC File
Using extensive statistics now available in Metastore
to better plan and optimize query, including
predicate pushdown during compilation to eliminate
portions of input (beyond partition pruning)
Latency
Columnar, type aware format with indices
Latency
Cost Based Optimizer Join re-ordering and other optimizations based on
(Optiq)
column statistics including histograms etc. (future)
© Hortonworks Inc. 2013.
Latency
Page 4
5. Hive on Tez – Basics
•
•
•
•
•
•
Hive/Tez integration in Hive 0.13 (Tez 0.2.0)
Hive/Tez 0.3.0 available in branch
Hive on MR works unchanged on hadoop 1 and 2
Hive on Tez is only available on hadoop 2
Turn on: hive.execution.engine=tez
Hive looks and feels the same on both MR and Tez
– CLI/JDBC/UDFs/SQL/Metastore
• Explain/reporting is similar to MR, but reflects differences in
plan/execution
• Deployment is simple (Tez comes as a client-side library)
• Job client can submit MR via Tez
– And emulate NES too (duck hunt anyone?)
© Hortonworks Inc. 2013.
Page 5
6. Query 88
select *
from
(select count(*) h8_30_to_9
from store_sales
JOIN household_demographics ON store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
JOIN time_dim ON store_sales.ss_sold_time_sk = time_dim.t_time_sk
JOIN store ON store_sales.ss_store_sk = store.s_store_sk
where
time_dim.t_hour = 8
and time_dim.t_minute >= 30
and ((household_demographics.hd_dep_count = 3 and
household_demographics.hd_vehicle_count<=3+2) or
(household_demographics.hd_dep_count = 0 and
household_demographics.hd_vehicle_count<=0+2) or
(household_demographics.hd_dep_count = 1 and
household_demographics.hd_vehicle_count<=1+2))
and store.s_store_name = 'ese') s1 JOIN
(select count(*) h9_to_9_30
from store_sales
...
• 8 full table scans
© Hortonworks Inc. 2013.
Page 6
7. Query 88: M/R
Total MapReduce jobs = 29
...
Total MapReduce CPU Time Spent: 0 days 2 hours 52 minutes 39
seconds 380 msec
OK
345617 687625 686131 1032842 1030364 606859 604232 692428
Time taken: 403.28 seconds, Fetched: 1 row(s)
© Hortonworks Inc. 2013.
Page 7
8. Query 88: Tez
Map 1: 1/1
Map 11: 1/1
Map 12: 1/1
Map 14: 1/1
Map 15: 1/1
Map 16: 241/241
Map 19: 1/1
Map 2: 1/1
Map 20: 1/1
Map 22: 1/1
Map 23: 1/1
Map 24: 241/241
Map 27: 1/1
Map 28: 1/1
Map 29: 1/1
Map 30: 240/240 Map 32: 241/241 Map 34: 1/1
Map 36: 1/1
Map 37: 1/1
Map 38: 1/1
Map 42: 1/1
Map 43: 1/1
Map 44: 240/240
Reducer 10: 1/1 Reducer 17: 1/1 Reducer 25: 1/1
Reducer 33: 1/1 Reducer 4: 1/1 Reducer 40: 1/1
Reducer 45: 1/1 Reducer 47: 1/1 Reducer 5: 1/1
Reducer 7: 1/1 Reducer 8: 1/1 Reducer 9: 1/1
Status: Finished successfully
OK
345617 687625 686131 1032842 1030364 606859
Time taken: 90.233 seconds, Fetched: 1 row(s)
© Hortonworks Inc. 2013.
Map 13: 1/1
Map 18: 1/1
Map 21: 1/1
Map 26: 1/1
Map 3: 241/241
Map 35: 1/1
Map 39: 241/241
Map 46: 241/241
Reducer 31: 1/1
Reducer 41: 1/1
Reducer 6: 1/1
604232
692428
Page 8
9. HIVE-4660
Hive-on-MR vs. Hive-on-Tez
SELECT g1.x, g1.avg, g2.cnt
FROM (SELECT a.x, AVERAGE(a.y) AS avg FROM a GROUP BY a.x) g1
JOIN (SELECT b.x, COUNT(b.y) AS avg FROM b GROUP BY b.x) g2
ON (g1.x = g2.x)
ORDER BY avg;
Hive – MR
M
M
Hive – Tez
GROUP b BY b.x
M
M
GROUP a BY a.x
R
Tez avoids
unnecessary writes
to HDFS
GROUP BY x
M
M
R
M
M
M
R
R
HDFS
JOIN (a,b)
HDFS
M
GROUP BY a.x
JOIN (a,b)
R
M
R
R
ORDER BY
HDFS
M
ORDER BY
R
© Hortonworks Inc. 2013.
R
M
10. Hive on Tez - Execution
Feature
Tez Session
Tez Container PreLaunch
Description
Overcomes Map-Reduce job-launch latency by prelaunching Tez AppMaster
Latency
Overcomes Map-Reduce latency by pre-launching
hot containers ready to serve queries.
Latency
Finished maps and reduces pick up more work
Tez Container Re-Use rather than exiting. Reduces latency and eliminates
difficult split-size tuning. Out of box performance!
Runtime reRuntime query tuning by picking aggregation
configuration of DAG parallelism using online query statistics
Tez In-Memory Cache Hot data kept in RAM for fast access.
Complex DAGs
© Hortonworks Inc. 2013.
Benefit
Tez Broadcast Edge and Map-Reduce-Reduce
pattern improve query scale and throughput.
Latency
Throughput
Latency
Throughput
Page 10
11. Pipelined Splits
Reduce start-up latency by launching tasks as soon as they are ready
hive.orc.compute.splits.num.threads=10
Map task
Tez AM
tez.am.grouping.split-waves=1.4
© Hortonworks Inc. 2013.
12. Pipelined Early Exit (Limit)
hive.orc.compute.splits.num.threads=10
Map task
Done!
tez.am.grouping.split-waves=1.4
© Hortonworks Inc. 2013.
13. HIVE-5775
Statistics and Cost-based optimization
• Statistics:
– Hive has table and column level statistics
– Used to determine parallelism, join selection
• Optiq: Open source, Apache licensed query execution framework in Java
– Used by Apache Drill, Apache Cascade, Lucene DB, …
– Based on Volcano paper
– 20 man years dev, more than 50 optimization rules
• Goals for hive
–
–
–
–
Ease of Use – no manual tuning for queries, make choices automatically based on cost
View Chaining/Ad hoc queries involving multiple views
Help enable BI Tools front-ending Hive
Emphasis on latency reduction
• Cost computation will be used for
Join ordering
Join algorithm selection
Tez vertex boundary selection
© Hortonworks Inc. 2013.
Page 13
14. Broadcast Join
• Similar to map-join w/o the need to build a hash table on the
client
• Will work with any level of sub-query nesting
• Uses stats to determine if applicable
• How it works:
– Broadcast result set is computed in parallel on the cluster
– Join processor are spun up in parallel
– Broadcast set is streamed to join processor
– Join processors build hash table
– Other relation is joined with hashtable
• Tez handles:
– Best parallelism
– Best data transfer of the hashed relation
– Best scheduling to avoid latencies
© Hortonworks Inc. 2013.
15. 1-1 Edge
• Typical star schema join involve join between large number of
tables
• Dimension aren’t always tiny (Customer dimension)
• Might not be able to handle all dimensions in single vertex as
broadcast joins
• Tez allows streaming records from one processor to the next via
a 1-1 Edge
– Transfer details (streaming, files, etc) are handled transparently
– Scheduling/cluster capacity is worked out by Tez
• Allows hive to build a pipeline of in memory joins which we can
stream records through
© Hortonworks Inc. 2013.
16. Dynamically partitioned Hash join
• Kicks in when large table is bucketed
– Bucketed table
– Dynamic as part of query processing
• Will use custom edge to match the partitioning on the smaller
table
• Allows hash-join in cases where broadcast would be too large
• Tez gives us the option of building custom edges and vertex
managers
– Fine grained control over how the data is replicated and partitioned
– Scheduling and actual data transfer is handled by Tez
© Hortonworks Inc. 2013.
17. Shuffle join and Grouping w/o sort
• In the MR model joins, grouping and window functions are
typically mapped onto a shuffle phase
• That means sorting will be used to implement the operator
• With Tez it’s easy to switch out the implementation
– Decouples partitioning, algorithm and transfer of the data
– The nitty-gritty details are still abstracted away (data movement,
scheduling, parallelism)
• With the right statistics that let’s hive pick the right
implementation for the query at hand
© Hortonworks Inc. 2013.
18. Union all
• Common operation in decision support queries
• Caused additional no-op stages in MR plans
– Last stage spins up multi-input mapper to write result
– Intermediate unions have to be materialized before additional processing
• Tez has union that handles these cases transparently w/o any
intermediate steps
© Hortonworks Inc. 2013.
19. Multi-insert queries
• Allows the same input to be split and written to different tables
or partitions
– Avoids duplicate scans/processing
– Useful for ETL
– Similar to “Splits” in PIG
• In MR a “split” in the operator pipeline has to be written to HDFS
and processed by multiple additional MR jobs
• Tez allows to send the mulitple outputs directly to downstream
processors
© Hortonworks Inc. 2013.
20. Putting it all together
Tez Session populates
container pool
Dimension table
calculation and HDFS
split generation in
parallel
Dimension tables
broadcasted to Hive
MapJoin tasks
Final Reducer prelaunched and fetches
completed inputs
TPCDS – Query-27 with Hive on Tez
Architecting the Future of Big Data
© Hortonworks Inc. 2013
Page 20
21. TPC-DS 10 TB Query Times (Tuned Queries)
Data: 10 TB data loaded into ORCFile using defaults.
Hardware: 20 nodes: 24 CPU threads, 64 GB RAM, 5 local SATA drives each.
Queries: TPC-DS queries with partition filters added.
© Hortonworks Inc. 2013.
Page 21
22. TPC-DS 30 TB Query Times (Tuned Queries)
Data: 30 TB data loaded into ORCFile using defaults.
Hardware: 14 nodes: 2x Intel E5-2660 v2, 256GB RAM, 23x 7.2kRPM drives each.
Queries: TPC-DS queries with partition filters added.
© Hortonworks Inc. 2013.
Page 22
23. Tez Concurrency Over 30 TB Dataset
• Test:
– Launch 20 queries concurrently and measure total time to completion.
– Used the 20 tuned TPC-DS queries from previous slide.
– Each query used once.
– Queries hit multiple different fact tables with different access patterns.
• Results:
– The 20 queries finished within 27.5 minutes.
– Average 82.65 seconds per query.
• Data and Hardware details:
– 30 Terabytes of data loaded into ORCFile using all defaults.
– Hardware: 14 nodes: 2x Intel E5-2660 v2, 256GB RAM, 23x 7.2kRPM drives each.
© Hortonworks Inc. 2013.
Page 23
Editor's Notes With Hive and Stinger we are focused on enabling the SQL ecosystem and to do that we’ve put Hive on a clear roadmap to SQL compliance.That includes adding critical datatypes like character and date types as well as implementing common SQL semantics seen in most databases.