SlideShare a Scribd company logo
1 of 47
Hive at Yahoo: Letters from the trenches
P R E S E N T E D B Y M i t h u n R a d h a k r i s h n a n , C h r i s D r o m e ⎪ J u n e 1 0 , 2 0 1 5
2 0 1 5 H a d o o p S u m m i t , S a n J o s e , C a l i f o r n i a
About myself
2 2014 Hadoop Summit, San Jose, California
 Mithun Radhakrishnan
 Hive Engineer at Yahoo!
 Hive Committer and long-time
contributor
› Metastore-scaling
› Integration
› HCatalog
 mithun@apache.org
 @mithunrk
About myself
3 2014 Hadoop Summit, San Jose, California
 Chris Drome
 Hive Engineer at Yahoo!
 Hive contributor
 cdrome@yahoo-inc.com
Recap
5 2015 Hadoop Summit, San Jose, California
6 2015 Hadoop Summit, San Jose, California
0
500
1000
1500
2000
2500
q1_pricing_summary_report.hive
q2_minimum_cost_supplier.hiveq3_shipping_priority.hive
q4_order_priority
q5_local_supplier_volume.hive
q6_forecast_revenue_change.hiveq7_volume_shipping.hive
q8_na
onal_market_share.hive
q9_product_type_profit.hiveq10_returned_item.hive
q11_important_stock.hive
q12_shipping.hive
q13_customer_distribu
on.hive
q14_promo
on_effect.hiveq15_top_supplier.hive
q16_parts_supplier_rela
onship.hive
q17_small_quan
ty_order_revenue.hive
q18_large_volume_customer.hive
q19_discounted_revenue.hive
q20_poten
al_part_promo
on.hive
q21_suppliers_who_kept_orders_waing.hive
q22_global_sales_opportunity.hive
Time(inseconds)
TPC-h 1TB
Hive 0.10 RC File
Hive 0.11 ORC
Hive 0.12 ORC
Hive 0.13 ORC MR
Hive 0.13 ORC Tez
1 TB
7 2015 Hadoop Summit, San Jose, California
› 6.2x speedup over Hive 0.10 (RCFile)
• Between 2.5-17x
› Average query time: 172 seconds
• Between 5-947 seconds
• Down from 729 seconds (Hive 0.10 RCFile)
› 61% queries completed in under 2 minutes
› 81% queries completed in under 4 minutes
Explaining the speed-ups
8 2015 Hadoop Summit, San Jose, California
 Hadoop 2.x, et al.
 Apache Tez
› (Arbitrary DAG)-based Execution Engine
› “Playing the gaps” between M&R
• Intermediate data and the HDFS
› Smart scheduling
› Container re-use
› Pipelined job start-up
 Hive
› Statistics
› Vectorized Execution
 ORC
› PPD
Expectations with Hive 0.13 production
9 2014 Hadoop Summit, San Jose, California
 Tez would outperform M/R by miles
 Tez would enable better cluster utilization
› Use less resources
 Tez (and dependencies) would be “production ready”
› GUI for task logs, DAG overviews, swim-lanes
› Speculative execution
 Similarly, ORC and Vectorization
› Support evolving schemas
The Y!Grid
10 2015 Hadoop Summit, San Jose, California
 18 Hadoop Clusters in YGrid
› 41565 Nodes
› Biggest cluster: 5728 Nodes
› 1M jobs a day
 Hadoop 2.6+
 Large Datasets
› Daily, hourly, minute-level frequencies
› Thousands of partitions, 100s of 1000s of files, TBs of data per partition
› 580 PB of data, total
 Pig 0.14 on Tez, Pig 0.11
 Hive 0.13 on Tez
 HCatalog for interoperability
 Oozie for scheduling
 GDM for data-loading
 Spark, HBase, Storm, etc…
Data processing use cases
11 2015 Hadoop Summit, San Jose, California
 Grid usage
› 30+ million jobs per month
› 12+ million Oozie launcher jobs
 Pig usage
› Handles majority of data pipelines/ETL (~43% of jobs)
 Hive usage
› Relatively smaller niche
› 632,000 queries per month (35% Tez)
 HCatalog for Inter-operability
› Metadata storage for all Hadoop data
› Yahoo-scale
› Pig pipelines with Hive analytics
Business Intelligence Tools
12 2015 Hadoop Summit, San Jose, California
 Tableau, MicroStrategy
 Power users
› Tableau Server for scheduled reports
 Challenges:
› Security
• ACLs, Authentication, Encryption over the wire
› Bandwidth
• Transporting results over ODBC
• Limit result-set to 1000s-10000s of rows
• Aggregations
› Query Latency
• Metadata queries
• Partition/Table scans
• Materialized views
 Data producer owns the data
› Unlike traditional DBs
 Multi-paradigm data access/generation
› Pig/Hive/MapReduce using HCatalog
 Highly available metadata service
 UI for tracking/debugging jobs
 Execution engine should ideally support speculative execution
13 2015 Hadoop Summit, San Jose, California
Non-negotiables for Hive upgrade at Yahoo!
Yahoo! Hive-0.13
14 2015 Hadoop Summit, San Jose, California
 Based on Apache Hive-0.13.1
 Internal Yahoo! Patches (admin web-services, data discovery, etc.)
 Community patches to stabilize Apache Hive-0.13.1
› Tez
• HIVE-7544, HIVE-6748, HIVE-7112, …
› Vectorization
• HIVE-8163, HIVE-8092, HIVE-7188, HIVE-7105, HIVE-7514, …
› Failures
• HIVE-7851, HIVE-7459, HIVE-7771, HIVE-7396, …
› Optimizations
• HIVE-7231, HIVE-7219, HIVE-7203, HIVE-7052, …
› Data integrity
• HIVE-7694, HIVE-7494, HIVE-7045, HIVE-7346, HIVE-7232, …
 Phased upgrades
› Phase 1: 285 JIRAs
› Phase 2: 23 JIRAs (HIVE-8781 and related dependencies)
› Phase 3: 46 JIRAs (HIVE-10114 and related dependencies)
 One remote Hive Metastore “instance”
› 4 HCatalog Servers behind a hardware VIP
• L3DSR load balancer
• 96GB-128GB RAM, 16 core boxes
› Backed by Oracle RAC
 About 10 Gateways
› Interactive use of Hive (and Pig, Oozie, M/R)
› hive.metastore.uris -> HCatalog
 About 4 HiveServer2 instances
› Ad Hoc queries, aggregation
15 2015 Hadoop Summit, San Jose, California
Hive deployment (per cluster)
Evolution of grid services at Yahoo!
16 Yahoo Confidential & Proprietary
Gateway Machines
Grid
OracleOracle RAC
Browser
HUE
Hive Server 2
BI Tools
HCatalogHCatalog
 Query performance on very large data sets
› HIVE-8292: Reading … has high overhead in MapOperator.cleanUpInputFileChangedOp
 Split-generation on very large data sets
› Tends to generate more splits (maps tasks) compared to M/R
› Long split generation times
› Hogging the Hadoop queues
• Wave factor vs multi-tenancy requirements
› HIVE-10114: Split strategies for ORC
 Scaling problems with ATS
› More of a problem with Pig workflows
› 10K+ tasks/job are routine
› AM progress reporting, heart-beating, memory usage
› Hadoop 2.6.0.10+
17 2015 Hadoop Summit, San Jose, California
Challenges experienced with Hive on Tez
18 Yahoo Confidential & Proprietary
 At Yahoo! Scale,
› 100s of Databases per cluster
› 100s of Tables per database
› 100s of columns per Table
› 1000s of Partitions per Table
• Larger tables: Thousands of partitions, per hour
• Millions of partitions every few days
• 10s of millions of partitions, over dataset retention period
 Problems:
› Metadata volume
• Database/Table/Partition IO Formats
• Record serialization details
• HDFS paths
• Statistics
– Per partition
– Per column
19 2015 Hadoop Summit, San Jose, California
Fast execution engines aren’t the whole picture
Letters from the trenches
21 2015 Hadoop Summit, San Jose, California
From: Another ETL pipeline.
To: The Yahoo Hive Team
Subject: Slow queries
YHive team,
My query fails with OutOfMemoryError. I tried increasing
container size, but it still fails. Please help!
Here are my settings:
set mapreduce.input.fileinputformat.split.maxsize=16777216;
set mapreduce.map.memory.mb=4096;
set mapreduce.reduce.memory.mb=4096;
set mapred.child.java.opts=“-Xmx1024m”
...
INSERT OVERWRITE TABLE my_table PARTITION( foo, bar, goo )
SELECT * FROM {
...
}
...
22 2015 Hadoop Summit, San Jose, California
From: YET another ETL pipeline.
To: The Yahoo Hive Team
Subject: Slow UDF performance
YHive team,
Why does using a simple custom UDF cause queries to
time out?
SELECT foo, bar, my_function( goo )
FROM my_large_table
WHERE ...
23 2015 Hadoop Summit, San Jose, California
From: The ETL team
To: The Yahoo Hive Team
Subject: A small matter of size...
Dear YHive team,
We have partitioned our table using the following
6 partition keys: {hourly-timestamp, name, property,
geo-location, shoe-size, and so on…}.
For a given timestamp, the combined cardinality of the
remaining partition-keys is about 10000/hr.
If queries on partitioned tables are supposed to
be faster, how come queries on our table take forever
just to get off the ground?
Yours gigantically,
Project Grape Ape
24 2015 Hadoop Summit, San Jose, California
25 2015 Hadoop Summit, San Jose, California
Metadata volume and Query Execution time
26 2015 Hadoop Summit, San Jose, California
 Anatomy of a Hive query
1. Compile query to AST
2. Thrift-call to Metastore, for partition list
3. Examine partitions, data-paths, etc. Construct physical query plan.
4. Run optimizers on the plan
5. Execute plan. (M/R, Tez).
 Partition pruner:
› Removes partitions that shouldn’t participate in the query.
› In effect, remove input-directories from the Hadoop job.
The problems of large-scale metadata
27 2015 Hadoop Summit, San Jose, California
 Partition pruner is single-threaded
› Query spans a day
› Query spanning a week? 2 million partitions
 Partition objects are huge:
› HDFS Paths
› IO Formats
› Record Deserializer info
› Data column schema
 Datanucleus:
› 1 Partition: Join 6 Oracle tables in the backend.
 Thrift serialization/deserialization takes minutes.
› *Minutes*.
Immediate workarounds
28 2015 Hadoop Summit, San Jose, California
 “Hive wasn’t originally designed for more than 10000s of partitions,
total…”
 Throw hardware at it
› 4 HCatalog servers behind a hardware VIP
› High-RAM boxes:
• 96GB-128 GB metastore processes
• Tune each to use 100 connections to the Oracle RAC
 Client-side tuning
› Increase hive.metastore.client.socket.timeout
› Increase heap size as needed (container size)
› Multi-threaded fstat operations
Fix the leaky/noisy bits
29 2015 Hadoop Summit, San Jose, California
 Metastore frequently ran out of memory:
› Disable Hadoop FileSystem cache
• HIVE-3098, HDFS-3545
• FileSystem.CACHE used UGI.hashcode()
– Compared Subjects for equality, not equivalence.
› Fixed Thrift 0.9
• TSaslServerTransport had circular references
• JVM couldn’t detect these for cleanup
– WeakReferences are your friend
• Fix incompatibility with L3DSR pings
 Data discovery from Oozie:
› Use JMS notifications, on publication
› Oozie Coordinators wake up on ActiveMQ notification, kick off dependent workflows
› Reduced polling frequency
More fixes
30 2015 Hadoop Summit, San Jose, California
 Metadata-only queries:
› SELECT DISTINCT tstamp FROM my_purple_table ORDER BY tstamp DESC LIMIT
1000;
› Replace HiveMetaStoreClient::getPartitions() with getPartitionNames().
› Local job, versus cluster.
 Optimize the optimizer:
› The first step in some optimizers:
• List<Partition> partitions = hiveMetaStoreClient.getPartitions( db, table,
(short)-1 );
• Pray that the client and/or the metastore don’t run out of memory.
• Take a nap.
› Fixed PartitionPruner, MetadataOnlyOptimizer.
Long-term fixes:
31 2015 Hadoop Summit, San Jose, California
 DirectSQL short-circuits:
› Datanucleus problems at scale
• (Yes, we are aware of the irony that might result from extrapolation.)
› Specific to the backing DB.
 Compaction of Partition info:
› HIVE-7223, HIVE-7576, HIVE-9845, etc.
› Schema evolves infrequently
› Partition-info rarely differs from table-info
– Except HDFS paths (which are super-strings)
› List<Partition> vs Iterator<Partition>
• PartitionSet abstraction
– The delight of Inheritance in Thrift
• Reduced memory foot-prints
32 2015 Hadoop Summit, San Jose, California
“The finest trick of The Devil was to
persuade you that he does not exist.”
-- ???
33 2015 Hadoop Summit, San Jose, California
34 2015 Hadoop Summit, San Jose, California
35 2015 Hadoop Summit, San Jose, California
From: A major reporting team
To: The Yahoo Hive Team
Subject: Urgent! Customer reports are borking.
Dear YHive team,
When we connect Tableau Server 8.3 to Y!Hive
0.12/0.13, it is unusably slow. Queries take too long
to run, and time out.
We’d prefer not to change our query-code too
much. How soon can Hive accommodate our simple queries?
Yours hysterically,
Project Zodiac
36 2015 Hadoop Summit, San Jose, California
Analysis: The query
37 2015 Hadoop Summit, San Jose, California
 Non-const partition key predicates:
› E.g.
WHERE utc_time <= from_unixtime(unix_timestamp()- 2*24*60*60,
'yyyyMMdd')
AND utc_time >= from_unixtime(unix_timestamp()- 32*24*60*60,
'yyyyMMdd')
› Solution: Use constant expressions where possible.
› Fix: Hive 1.x supports dynamic partition pruning, and constant folding.
 Costly joins with partitioned dimension tables:
› E.g.
› SELECT … FROM fact_table JOIN (SELECT * FROM dimension_table
WHERE dt IN (SELECT MAX(dt) from dimension_table);
› Workaround: External “pointer” tables.
› Fix: Dynamic partition pruning.
Analysis: The data
38 2015 Hadoop Summit, San Jose, California
 Data stored in TEXTFILE
› Solution: Switch to columnar storage
• ORC, dictionary encoding, vectorization, predicate pushdown
 Over-partitioning:
› Too many partition keys
› Diminishing returns with partition pruning
› Solution: Eliminate partition keys, consider sorting
 Small Part files
› Hard-coded nReducers
› E.g.
hive> dfs -count /projects/foo_stats;
9081 682735 1876847648672 /projects/foo.db/foo_stats
› Solution:
• set hive.merge.mapfiles=true;
• set hive.merge.mapredfiles=true;
• set hive.merge.tezfiles=true;
We’re not done yet
39 2015 Hadoop Summit San Jose
 Tez/ATS scaling
 Speed up split calculation
 Auto/Offline compaction
 Abuse detection
 Better handling of schema
evolution
 Skew Joins in Hive
 UDFs with JNI and configuring
LD_LIBRARY_PATH
Questions?
Backup
YHive configuration settings:
42 2014 Hadoop Summit, San Jose, California
set hive.merge.mapfiles=false; -- Except when producing data.
set hive.merge.mapredfiles=false; -- Except when producing data.
set tez.merge.files=false; -- Except when producing data.
-- For ORC files.
-- dfs.blocksize=134217728; -- hdfs-site.xml
set orc.stripe.size=67108864; -- 64MB stripes.
set orc.compress.size=262144; -- 256KB compress buffer.
set orc.compress=ZLIB; -- Override to NONE, per table.
set orc.create.index=true; -- ORC indexes.
set orc.optimize.index.filter=true; -- Predicate pushdown with ORC index
set orc.row.index.stride=10000;
YHive configuration settings: (contd)
43 2014 Hadoop Summit, San Jose, California
-- Delegation Token Store settings:
set hive.cluster.delegation.token.store.class=ZooKeeperTokenStore;
set hive.cluster.delegation.token.renew-interval=172800000;
(Start HCat Server with -Djute.maxbuffer=24MB -> 190K+ tokens.)
-- Data Nucleus settings:
set datanucleus.connectionPoolingType=DBCP; -- !(BoneCP).
set datanucleus.cache.level1.type=none;
set datanucleus.cache.level2.type=none;
set datanucleus.connectionPool.maxWait=200000;
set datanucleus.connectionPool.minIdle=0;
-- Misc.
set hive.metastore.event.listeners=com.yahoo.custom.JMSListener;
Zookeeper Token Storage performance
44 2014 Hadoop Summit, San Jose, California
Jute Buffer Size (in MB) Max delegation token count
4MB 30K
8MB 60K
12MB 90K
16MB 130K
20MB 160K
24MB 190K
Why Hive on Tez?
45 2015 Hadoop Summit, San Jose, California
 Shark, Impala
› Pre-emption for in-memory systems
› Multi-tenant, shared clusters
› Heterogeneous nodes
› Existing ecosystem
› Community-driven development
 Shark
› Good proof of concept, but was not production ready
› Shuffle performance (at the time)
› Hive on Spark – under active development
Analysis: Tableau/ODBC driver
46 2015 Hadoop Summit, San Jose, California
 Tableau has come a long way, but
› Schema discovery
• SELECT * FROM my_large_table LIMIT 0;
• SELECT DISTINCT part_key FROM my_large_table;
› SQL dialect
• Depends on vendor-specific driver-name
› Schema metadata-scans
• 3 partition listings per query
› Miscellaneous problems:
• “Custom SQL” rewrites
• Trouble with quoting
 tl;dr : Try to transition to Simba’s 2.0.x Drivers with Tableau 8.3.x
47 2015 Hadoop Summit, San Jose, California

More Related Content

What's hot

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
Cloudera, Inc.
 

What's hot (20)

February 2014 HUG : Pig On Tez
February 2014 HUG : Pig On TezFebruary 2014 HUG : Pig On Tez
February 2014 HUG : Pig On Tez
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning2013 July 23 Toronto Hadoop User Group Hive Tuning
2013 July 23 Toronto Hadoop User Group Hive Tuning
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Big Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive ComparisonBig Data Warehousing: Pig vs. Hive Comparison
Big Data Warehousing: Pig vs. Hive Comparison
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 

Similar to Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
Edureka!
 

Similar to Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches (20)

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010Hadoop and Pig at Twitter__HadoopSummit2010
Hadoop and Pig at Twitter__HadoopSummit2010
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Big data applications
Big data applicationsBig data applications
Big data applications
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Data infrastructure at Facebook
Data infrastructure at Facebook Data infrastructure at Facebook
Data infrastructure at Facebook
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 

Recently uploaded (20)

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 

Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches

  • 1. Hive at Yahoo: Letters from the trenches P R E S E N T E D B Y M i t h u n R a d h a k r i s h n a n , C h r i s D r o m e ⎪ J u n e 1 0 , 2 0 1 5 2 0 1 5 H a d o o p S u m m i t , S a n J o s e , C a l i f o r n i a
  • 2. About myself 2 2014 Hadoop Summit, San Jose, California  Mithun Radhakrishnan  Hive Engineer at Yahoo!  Hive Committer and long-time contributor › Metastore-scaling › Integration › HCatalog  mithun@apache.org  @mithunrk
  • 3. About myself 3 2014 Hadoop Summit, San Jose, California  Chris Drome  Hive Engineer at Yahoo!  Hive contributor  cdrome@yahoo-inc.com
  • 5. 5 2015 Hadoop Summit, San Jose, California
  • 6. 6 2015 Hadoop Summit, San Jose, California 0 500 1000 1500 2000 2500 q1_pricing_summary_report.hive q2_minimum_cost_supplier.hiveq3_shipping_priority.hive q4_order_priority q5_local_supplier_volume.hive q6_forecast_revenue_change.hiveq7_volume_shipping.hive q8_na onal_market_share.hive q9_product_type_profit.hiveq10_returned_item.hive q11_important_stock.hive q12_shipping.hive q13_customer_distribu on.hive q14_promo on_effect.hiveq15_top_supplier.hive q16_parts_supplier_rela onship.hive q17_small_quan ty_order_revenue.hive q18_large_volume_customer.hive q19_discounted_revenue.hive q20_poten al_part_promo on.hive q21_suppliers_who_kept_orders_waing.hive q22_global_sales_opportunity.hive Time(inseconds) TPC-h 1TB Hive 0.10 RC File Hive 0.11 ORC Hive 0.12 ORC Hive 0.13 ORC MR Hive 0.13 ORC Tez
  • 7. 1 TB 7 2015 Hadoop Summit, San Jose, California › 6.2x speedup over Hive 0.10 (RCFile) • Between 2.5-17x › Average query time: 172 seconds • Between 5-947 seconds • Down from 729 seconds (Hive 0.10 RCFile) › 61% queries completed in under 2 minutes › 81% queries completed in under 4 minutes
  • 8. Explaining the speed-ups 8 2015 Hadoop Summit, San Jose, California  Hadoop 2.x, et al.  Apache Tez › (Arbitrary DAG)-based Execution Engine › “Playing the gaps” between M&R • Intermediate data and the HDFS › Smart scheduling › Container re-use › Pipelined job start-up  Hive › Statistics › Vectorized Execution  ORC › PPD
  • 9. Expectations with Hive 0.13 production 9 2014 Hadoop Summit, San Jose, California  Tez would outperform M/R by miles  Tez would enable better cluster utilization › Use less resources  Tez (and dependencies) would be “production ready” › GUI for task logs, DAG overviews, swim-lanes › Speculative execution  Similarly, ORC and Vectorization › Support evolving schemas
  • 10. The Y!Grid 10 2015 Hadoop Summit, San Jose, California  18 Hadoop Clusters in YGrid › 41565 Nodes › Biggest cluster: 5728 Nodes › 1M jobs a day  Hadoop 2.6+  Large Datasets › Daily, hourly, minute-level frequencies › Thousands of partitions, 100s of 1000s of files, TBs of data per partition › 580 PB of data, total  Pig 0.14 on Tez, Pig 0.11  Hive 0.13 on Tez  HCatalog for interoperability  Oozie for scheduling  GDM for data-loading  Spark, HBase, Storm, etc…
  • 11. Data processing use cases 11 2015 Hadoop Summit, San Jose, California  Grid usage › 30+ million jobs per month › 12+ million Oozie launcher jobs  Pig usage › Handles majority of data pipelines/ETL (~43% of jobs)  Hive usage › Relatively smaller niche › 632,000 queries per month (35% Tez)  HCatalog for Inter-operability › Metadata storage for all Hadoop data › Yahoo-scale › Pig pipelines with Hive analytics
  • 12. Business Intelligence Tools 12 2015 Hadoop Summit, San Jose, California  Tableau, MicroStrategy  Power users › Tableau Server for scheduled reports  Challenges: › Security • ACLs, Authentication, Encryption over the wire › Bandwidth • Transporting results over ODBC • Limit result-set to 1000s-10000s of rows • Aggregations › Query Latency • Metadata queries • Partition/Table scans • Materialized views
  • 13.  Data producer owns the data › Unlike traditional DBs  Multi-paradigm data access/generation › Pig/Hive/MapReduce using HCatalog  Highly available metadata service  UI for tracking/debugging jobs  Execution engine should ideally support speculative execution 13 2015 Hadoop Summit, San Jose, California Non-negotiables for Hive upgrade at Yahoo!
  • 14. Yahoo! Hive-0.13 14 2015 Hadoop Summit, San Jose, California  Based on Apache Hive-0.13.1  Internal Yahoo! Patches (admin web-services, data discovery, etc.)  Community patches to stabilize Apache Hive-0.13.1 › Tez • HIVE-7544, HIVE-6748, HIVE-7112, … › Vectorization • HIVE-8163, HIVE-8092, HIVE-7188, HIVE-7105, HIVE-7514, … › Failures • HIVE-7851, HIVE-7459, HIVE-7771, HIVE-7396, … › Optimizations • HIVE-7231, HIVE-7219, HIVE-7203, HIVE-7052, … › Data integrity • HIVE-7694, HIVE-7494, HIVE-7045, HIVE-7346, HIVE-7232, …  Phased upgrades › Phase 1: 285 JIRAs › Phase 2: 23 JIRAs (HIVE-8781 and related dependencies) › Phase 3: 46 JIRAs (HIVE-10114 and related dependencies)
  • 15.  One remote Hive Metastore “instance” › 4 HCatalog Servers behind a hardware VIP • L3DSR load balancer • 96GB-128GB RAM, 16 core boxes › Backed by Oracle RAC  About 10 Gateways › Interactive use of Hive (and Pig, Oozie, M/R) › hive.metastore.uris -> HCatalog  About 4 HiveServer2 instances › Ad Hoc queries, aggregation 15 2015 Hadoop Summit, San Jose, California Hive deployment (per cluster)
  • 16. Evolution of grid services at Yahoo! 16 Yahoo Confidential & Proprietary Gateway Machines Grid OracleOracle RAC Browser HUE Hive Server 2 BI Tools HCatalogHCatalog
  • 17.  Query performance on very large data sets › HIVE-8292: Reading … has high overhead in MapOperator.cleanUpInputFileChangedOp  Split-generation on very large data sets › Tends to generate more splits (maps tasks) compared to M/R › Long split generation times › Hogging the Hadoop queues • Wave factor vs multi-tenancy requirements › HIVE-10114: Split strategies for ORC  Scaling problems with ATS › More of a problem with Pig workflows › 10K+ tasks/job are routine › AM progress reporting, heart-beating, memory usage › Hadoop 2.6.0.10+ 17 2015 Hadoop Summit, San Jose, California Challenges experienced with Hive on Tez
  • 18. 18 Yahoo Confidential & Proprietary
  • 19.  At Yahoo! Scale, › 100s of Databases per cluster › 100s of Tables per database › 100s of columns per Table › 1000s of Partitions per Table • Larger tables: Thousands of partitions, per hour • Millions of partitions every few days • 10s of millions of partitions, over dataset retention period  Problems: › Metadata volume • Database/Table/Partition IO Formats • Record serialization details • HDFS paths • Statistics – Per partition – Per column 19 2015 Hadoop Summit, San Jose, California Fast execution engines aren’t the whole picture
  • 20. Letters from the trenches
  • 21. 21 2015 Hadoop Summit, San Jose, California From: Another ETL pipeline. To: The Yahoo Hive Team Subject: Slow queries YHive team, My query fails with OutOfMemoryError. I tried increasing container size, but it still fails. Please help! Here are my settings: set mapreduce.input.fileinputformat.split.maxsize=16777216; set mapreduce.map.memory.mb=4096; set mapreduce.reduce.memory.mb=4096; set mapred.child.java.opts=“-Xmx1024m” ... INSERT OVERWRITE TABLE my_table PARTITION( foo, bar, goo ) SELECT * FROM { ... } ...
  • 22. 22 2015 Hadoop Summit, San Jose, California From: YET another ETL pipeline. To: The Yahoo Hive Team Subject: Slow UDF performance YHive team, Why does using a simple custom UDF cause queries to time out? SELECT foo, bar, my_function( goo ) FROM my_large_table WHERE ...
  • 23. 23 2015 Hadoop Summit, San Jose, California
  • 24. From: The ETL team To: The Yahoo Hive Team Subject: A small matter of size... Dear YHive team, We have partitioned our table using the following 6 partition keys: {hourly-timestamp, name, property, geo-location, shoe-size, and so on…}. For a given timestamp, the combined cardinality of the remaining partition-keys is about 10000/hr. If queries on partitioned tables are supposed to be faster, how come queries on our table take forever just to get off the ground? Yours gigantically, Project Grape Ape 24 2015 Hadoop Summit, San Jose, California
  • 25. 25 2015 Hadoop Summit, San Jose, California
  • 26. Metadata volume and Query Execution time 26 2015 Hadoop Summit, San Jose, California  Anatomy of a Hive query 1. Compile query to AST 2. Thrift-call to Metastore, for partition list 3. Examine partitions, data-paths, etc. Construct physical query plan. 4. Run optimizers on the plan 5. Execute plan. (M/R, Tez).  Partition pruner: › Removes partitions that shouldn’t participate in the query. › In effect, remove input-directories from the Hadoop job.
  • 27. The problems of large-scale metadata 27 2015 Hadoop Summit, San Jose, California  Partition pruner is single-threaded › Query spans a day › Query spanning a week? 2 million partitions  Partition objects are huge: › HDFS Paths › IO Formats › Record Deserializer info › Data column schema  Datanucleus: › 1 Partition: Join 6 Oracle tables in the backend.  Thrift serialization/deserialization takes minutes. › *Minutes*.
  • 28. Immediate workarounds 28 2015 Hadoop Summit, San Jose, California  “Hive wasn’t originally designed for more than 10000s of partitions, total…”  Throw hardware at it › 4 HCatalog servers behind a hardware VIP › High-RAM boxes: • 96GB-128 GB metastore processes • Tune each to use 100 connections to the Oracle RAC  Client-side tuning › Increase hive.metastore.client.socket.timeout › Increase heap size as needed (container size) › Multi-threaded fstat operations
  • 29. Fix the leaky/noisy bits 29 2015 Hadoop Summit, San Jose, California  Metastore frequently ran out of memory: › Disable Hadoop FileSystem cache • HIVE-3098, HDFS-3545 • FileSystem.CACHE used UGI.hashcode() – Compared Subjects for equality, not equivalence. › Fixed Thrift 0.9 • TSaslServerTransport had circular references • JVM couldn’t detect these for cleanup – WeakReferences are your friend • Fix incompatibility with L3DSR pings  Data discovery from Oozie: › Use JMS notifications, on publication › Oozie Coordinators wake up on ActiveMQ notification, kick off dependent workflows › Reduced polling frequency
  • 30. More fixes 30 2015 Hadoop Summit, San Jose, California  Metadata-only queries: › SELECT DISTINCT tstamp FROM my_purple_table ORDER BY tstamp DESC LIMIT 1000; › Replace HiveMetaStoreClient::getPartitions() with getPartitionNames(). › Local job, versus cluster.  Optimize the optimizer: › The first step in some optimizers: • List<Partition> partitions = hiveMetaStoreClient.getPartitions( db, table, (short)-1 ); • Pray that the client and/or the metastore don’t run out of memory. • Take a nap. › Fixed PartitionPruner, MetadataOnlyOptimizer.
  • 31. Long-term fixes: 31 2015 Hadoop Summit, San Jose, California  DirectSQL short-circuits: › Datanucleus problems at scale • (Yes, we are aware of the irony that might result from extrapolation.) › Specific to the backing DB.  Compaction of Partition info: › HIVE-7223, HIVE-7576, HIVE-9845, etc. › Schema evolves infrequently › Partition-info rarely differs from table-info – Except HDFS paths (which are super-strings) › List<Partition> vs Iterator<Partition> • PartitionSet abstraction – The delight of Inheritance in Thrift • Reduced memory foot-prints
  • 32. 32 2015 Hadoop Summit, San Jose, California “The finest trick of The Devil was to persuade you that he does not exist.” -- ???
  • 33. 33 2015 Hadoop Summit, San Jose, California
  • 34. 34 2015 Hadoop Summit, San Jose, California
  • 35. 35 2015 Hadoop Summit, San Jose, California
  • 36. From: A major reporting team To: The Yahoo Hive Team Subject: Urgent! Customer reports are borking. Dear YHive team, When we connect Tableau Server 8.3 to Y!Hive 0.12/0.13, it is unusably slow. Queries take too long to run, and time out. We’d prefer not to change our query-code too much. How soon can Hive accommodate our simple queries? Yours hysterically, Project Zodiac 36 2015 Hadoop Summit, San Jose, California
  • 37. Analysis: The query 37 2015 Hadoop Summit, San Jose, California  Non-const partition key predicates: › E.g. WHERE utc_time <= from_unixtime(unix_timestamp()- 2*24*60*60, 'yyyyMMdd') AND utc_time >= from_unixtime(unix_timestamp()- 32*24*60*60, 'yyyyMMdd') › Solution: Use constant expressions where possible. › Fix: Hive 1.x supports dynamic partition pruning, and constant folding.  Costly joins with partitioned dimension tables: › E.g. › SELECT … FROM fact_table JOIN (SELECT * FROM dimension_table WHERE dt IN (SELECT MAX(dt) from dimension_table); › Workaround: External “pointer” tables. › Fix: Dynamic partition pruning.
  • 38. Analysis: The data 38 2015 Hadoop Summit, San Jose, California  Data stored in TEXTFILE › Solution: Switch to columnar storage • ORC, dictionary encoding, vectorization, predicate pushdown  Over-partitioning: › Too many partition keys › Diminishing returns with partition pruning › Solution: Eliminate partition keys, consider sorting  Small Part files › Hard-coded nReducers › E.g. hive> dfs -count /projects/foo_stats; 9081 682735 1876847648672 /projects/foo.db/foo_stats › Solution: • set hive.merge.mapfiles=true; • set hive.merge.mapredfiles=true; • set hive.merge.tezfiles=true;
  • 39. We’re not done yet 39 2015 Hadoop Summit San Jose  Tez/ATS scaling  Speed up split calculation  Auto/Offline compaction  Abuse detection  Better handling of schema evolution  Skew Joins in Hive  UDFs with JNI and configuring LD_LIBRARY_PATH
  • 42. YHive configuration settings: 42 2014 Hadoop Summit, San Jose, California set hive.merge.mapfiles=false; -- Except when producing data. set hive.merge.mapredfiles=false; -- Except when producing data. set tez.merge.files=false; -- Except when producing data. -- For ORC files. -- dfs.blocksize=134217728; -- hdfs-site.xml set orc.stripe.size=67108864; -- 64MB stripes. set orc.compress.size=262144; -- 256KB compress buffer. set orc.compress=ZLIB; -- Override to NONE, per table. set orc.create.index=true; -- ORC indexes. set orc.optimize.index.filter=true; -- Predicate pushdown with ORC index set orc.row.index.stride=10000;
  • 43. YHive configuration settings: (contd) 43 2014 Hadoop Summit, San Jose, California -- Delegation Token Store settings: set hive.cluster.delegation.token.store.class=ZooKeeperTokenStore; set hive.cluster.delegation.token.renew-interval=172800000; (Start HCat Server with -Djute.maxbuffer=24MB -> 190K+ tokens.) -- Data Nucleus settings: set datanucleus.connectionPoolingType=DBCP; -- !(BoneCP). set datanucleus.cache.level1.type=none; set datanucleus.cache.level2.type=none; set datanucleus.connectionPool.maxWait=200000; set datanucleus.connectionPool.minIdle=0; -- Misc. set hive.metastore.event.listeners=com.yahoo.custom.JMSListener;
  • 44. Zookeeper Token Storage performance 44 2014 Hadoop Summit, San Jose, California Jute Buffer Size (in MB) Max delegation token count 4MB 30K 8MB 60K 12MB 90K 16MB 130K 20MB 160K 24MB 190K
  • 45. Why Hive on Tez? 45 2015 Hadoop Summit, San Jose, California  Shark, Impala › Pre-emption for in-memory systems › Multi-tenant, shared clusters › Heterogeneous nodes › Existing ecosystem › Community-driven development  Shark › Good proof of concept, but was not production ready › Shuffle performance (at the time) › Hive on Spark – under active development
  • 46. Analysis: Tableau/ODBC driver 46 2015 Hadoop Summit, San Jose, California  Tableau has come a long way, but › Schema discovery • SELECT * FROM my_large_table LIMIT 0; • SELECT DISTINCT part_key FROM my_large_table; › SQL dialect • Depends on vendor-specific driver-name › Schema metadata-scans • 3 partition listings per query › Miscellaneous problems: • “Custom SQL” rewrites • Trouble with quoting  tl;dr : Try to transition to Simba’s 2.0.x Drivers with Tableau 8.3.x
  • 47. 47 2015 Hadoop Summit, San Jose, California

Editor's Notes

  1. TODO: Update latest profile pic
  2. TODO: Update latest profile pic
  3. At last year’s talk, which was received so enthusiastically.
  4. Tez : Scheduling. Playing the gaps, like Beethoven’s Fifth.
  5. Why 13? Why move from 12?
  6. Talk up the work from Gemini. Power-users of Tableau Server. People with RDBMS expertise might think Partitions are analogous to Indexes. The more you have, the faster the query should run.
  7. Add diagram for deployment of Hive, and its evolution.
  8. Last year saw a tonne of benchmarketing. Tez vs Spark (vs Impala). We’ve had several choices of execution engines. But we seem to have forgotten to scale a crucial part of the system. The metastore.
  9. Talk about the kinds of metadata: Input/Output formats, per table, per partition. Record format information. SerDe classes. Data paths Table/Partition level statistics: Also mention the Hundreds of columns per table.
  10. Small split-size.
  11. My_function() is a webservice call. hive.log.incremental.plan.progress.
  12. This table is our largest. We use this to test and break our system.
  13. Focus on data-paths.
  14. Interesting segue: HiveMetaStoreClient.getPartitions() takes an argument to limit the number of partitions returned. That limit is a Short. (?!)
  15. Elaborate the problems with datanucleus at scale: Thread safety Memory usage Performance Schema evolution can happen both at a geological pace, as well as a tectonic scale. Inheritance in Thrift is like implementing it in C. Mention that similar changes were made in Pig/HCatalog, for compressing Partition info. 26x storage saving (for split meta-info), + 10x faster for the query to start.
  16. The Java anecdote.
  17. Verbal Kint, from Usual Suspects.
  18. Moriarty, the “consulting criminal”, from Sherlock.
  19. Charles Baudelaire. The Java anecdote.
  20. Introduce the beast that is Tableau. Flash the “simple” query?
  21. Bucky Lasek at the X-Games in 2001. Notice where he’s looking… Not at the camera, but setting up his next trick.
  22. Talk about distcp –pgrub, for ORC files.
  23. Scale-test results for ZookeeperTokenStore, with HCatalog. With 24MB of jute-buffer size, you should be able to accommodate around 200K delegation tokens at the same time. On our busier clusters, we probably have about 20-30K simultaneously. This should help increase token-lifetimes, for long-running jobs.
  24. Praise the work from Simba, Tableau. Much has been achieved. Rework slide. Too much info. Just put the TLDR. SQL dialect Depends on vendor-specific driver-name Schema metadata-scans 3 partition listings per query Miscellaneous problems: “Custom SQL” rewrites Trouble with quoting