3. ROLLUP, CUBE (Hive 0.10)
select state, year, sum(amt_paid) select state, year, sum(amt_paid)
from sales from sales
group by state, year with rollup group by state, year with cube
State Year Sum State Year Sum
CA 2011 20000 CA 2011 20000
CA 2012 25000 CA 2012 25000
CA * 45000 CA * 45000
NY 2012 15000 NY 2012 15000
NY * 15000 NY * 15000
* * 60000 * * 60000
* 2011 20000
* 2012 40000
HIVE-3433
Page 3
4. Support for Analytics
• Simple analytical tasks can turn into unintuitive and inefficient queries
select
count(*) as rk,
s2.state as state,
s2.product as product,
avg(s2.amt_paid),
sum(s1.amt_paid)
from
sales s1
join sales s2
on (s1.product = s2.product and s1.state = s2.state)
where s1.year <= s2.year
group by s2.state, s2.product, s2.year
order by state, product, rk;
Page 4
5. Support for Analytics
• Simple numbering + running total
Number State Product Amount Total
1 CA A 1000 1000
2 CA A 500 1500
3 CA A 700 2200
4 CA A 300 2500
1 CA B 500 500
2 CA B 500 1000
Page 5
6. Support for Analytics
• Faster, but still not very intuitive
select
state,
product,
amt_paid,
rsum(hash(state, product),amt_paid)
from
(
select state, product, amt_paid
from sales distribute by hash(state,product)
sort by state, product
) t;
Page 6
7. Support for Analytics – OVER clause
• Now that’s more like it
select
rank() over state_and_product,
state,
product,
amt_paid,
sum(amt_paid) over state_and_product
from sales
window state_and_product
as (partition by state, product order by year);
Page 7
8. Support for Analytics – OVER clause
partition by order by rows
AL 2012 1000.00
CA 2010 2000.00
CA 2011 2000.00
CA 2012 4000.00
CA 2013 1000.00
NY 2012 500.00
• OVER clause
– PARTITION BY, ORDER BY, ROWS BETWEEN/FOLLOWING/PRECEDING
– Works with current aggregate functions
– New aggregates/window functions
– RANK, LEAD, ROW_NUMBER, LAG, LEAD, FIRST_VALUE, LAST_VALUE
– NTILE, DENSE_RANK, CUME_DIST, PERCENT_RANK, PERCENT_CONT,
PERCENT_DISC
HIVE-896
Page 8
9. Support for Analytics Continued
• Sub-queries in WHERE
– Non-correlated only
– [NOT] IN supported
– Plan to optimize to fit in memory as hash table when feasible, join when not
• Standard SQL data types
– datetime
– char() and varchar()
– add precision and scale to decimal and float
– aliases for standard SQL types (BLOB = binary, CLOB = string, integer = int,
real/number = decimal)
Page 9
10. Automatic join conversion
Sorted?
Sorted?
Sort Merge
Sort Merge
Bucket Join
Bucket Join
• When enabled hive will automatically
pick join implementation
• Query hints no longer needed
• Can be configured to run without
conditional tasks
HIVE-3784
Page 10
11. Merging join tasks
select Task Task Task
… Mapjoin Mapjoin Mapjoin
Mapjoin Mapjoin Mapjoin
from sales
join date_dim on (…)
join time_dim on (…)
join state on (…) Task
… Mapjoin -> Mapjoin ->Mapjoin
Mapjoin -> Mapjoin ->Mapjoin
•Used to generate sequence of map-only jobs
•Hive will now do as many map-joins as fit in memory in single map-only job
•Memory limit is configurable
•Memory size is estimated from file size
HIVE-3784
Page 11
12. M-MR to MR
select Map Task Map Task Reduce Task
sum(…) Mapjoin Mapside Group
Mapjoin Mapside Group
… Aggr
Aggr by/Aggr
by/Aggr
from sales
join date_dim on (…)
group by … Map Task Reduce Task
… Mapjoin -> Mapside
Mapjoin -> Mapside Group by/Aggr
Group by/Aggr
Aggr
Aggr
•Used to run as map-only job followed by a map-reduce job
•Hive will now merge the two map tasks
HIVE-3952
Page 12
13. Group by/Order by (ReduceSinkDeDup)
select Map Task Reduce Task Map Task Reduce Task
… Map- Group Noop Noop
Map- Group Noop Noop
from sales side/Aggr
side/Aggr by/Aggr
by/Aggr
group by store, item
order by store
Map Task Reduce Task
Map-
Map- Group
Group
side/Aggr
side/Aggr by/Aggr
by/Aggr
•Used to generate map-reduce job for group by followed by map-reduce job for
order by
•Hive will now do both in same job
•More general: Will search for reduce sinks on same keys and combine
•Caution: Might degrade performance if difference in num reducers is big
HIVE-2340
Page 13
14. Upcoming: Limit pushdown
select Map Task Reduce Task
… Map- Group
Map- Group
from sales side/Aggr
side/Aggr by/Aggr+Limit
by/Aggr+Limit
group by store, item
order by store
limit 20 Map Task Reduce Task
Map-side/Aggr
Map-side/Aggr Group
Group
+Top-k
+Top-k by/Aggr+Limit
by/Aggr+Limit
•Used to output all pre-aggregated data from map task and limit the output in
the reducer
•Hive will keep a top-k list of elements in each map task, reducing the amount
of data to be shuffled
HIVE-3562
Page 14
15. Upcoming: Total order sort
• Order by queries no longer result in single reducer
• Makes it easier to apply optimizations such as group by/order by
• Requires knowledge of the key distribution (sampling)
HIVE-1402
Page 15
19. Beyond Batch with YARN & Tez
Tez Generalizes Map-Reduce Always-On Tez Service
Simplified execution plans process Low latency processing for
data more efficiently all Hadoop data processing
Page 19
20. Tez Service
• Hive Query Startup Expensive
– Job launch & task-launch latencies are fatal for short queries (in
order of 5s to 30s)
• Solution
– Tez Service
– Removes task-launch overhead
– Removes job-launch overhead
– Hive
– Submit query-plan to Tez Service
– Native Hadoop service, not ad-hoc
Page 20
21. Tez- Core Idea
Task with pluggable Input, Processor & Output
Input
Input Processor
Processor Output
Output
Tez Task
YARN ApplicationMaster to run DAG of Tez Tasks
Page 21
22. Hive/MR versus Hive/Tez
select a.state, count(*)
from a join b on (a.id = b.id)
group by a.state
I/O Synchronization I/O Pipelining
Barrier
Hive - MR Hive - Tez
Page 22
23. Hive/MR versus Hive/Tez
select store, state, total from
(select storeid, sum(sales_price) total
from sales s join date_dim d on (s.dateid = d.dateid)
where d.year = 2012
group by storeid ) ss
join
(select storeid, store, state from store
join state on (store.stateid = state.stateid) ) sd
on (sd.storeid = ss.storeid)
Hive - MR Hive - Tez
Page 23
24. Hive Performance Longer Term - Caching
• Need to be able to keep hot data sets in memory
• Could be done via pinning files in OS buffer cache
• Could be done with separate process running its own buffer cache
• Need to evaluate best plan
• Would like to pin dimension tables in memory
• Latest partitions of large tables also a candidate
• Ideally will include changes to the scheduler to understand which nodes have
which partitions/tables cached
Page 24
25. Hive Performance Longer Term -
Vectorization
• Rewrite operators to work on arrays of Java scalars
• MonetDB paper
• Operates on blocks of 1K or more records
• Each block contains an array of Java scalars, one for each column
• Avoids many function calls
• Size to fit in L1 cache, avoid cache misses
• Generate code for operators on the fly to avoid branches in code, maximize
deep pipelines of modern processers
• Requires conversion of all column values to Java scalars – no objects allowed
– Integrates nicely with ORC work, other input types will need conversion on reading
• Want to write this in a way it can be shared by Pig, Cascading, MR
programmers
Page 25