SlideShare ist ein Scribd-Unternehmen logo
1 von 54
2016年9月6日
Achieving 100k Queries per
Hour with Hive on Tez
About Yahoo! JAPAN
2
The Largest Portal Site in Japan
65 billon pageviews / month
2.1 billon pageviews / day
YDN Report
What is YDN Report?
• Report for Yahoo Display Ads. Networks
Batch Reporting over Massive Dataset
• 13 months, 800B+ rows of data
• Adding 3.3B+ rows of data per day
Highly Parallel Workload
• 100K reports per hour
3
YDN Report Query
Typical Query
• Query is Relatively Simple
• Answer “How many clicks did I get last week?”
4
0
5000
10000
15000
1 3 5 7 9 11 13 15 17 19 21 23 25 27
SELECT account, yyyymmdd,
sum(total_imps),
sum(total_click),
...
FROM table_x
WHERE yyyymmdd >= xxx
AND yyyymmdd < xxx
AND account = xxx
...
GROUP BY account, yyyymmdd, ...;
Test Setup
5
Hive Performance Recap
Hive is fast: interactive response
• ORC columnar file format
• Cost based optimizer (CBO)
• Vectorized SQL engine
• Tez execution engine (replacing MapReduce)
Hive 0.10
Batch
Processing 100-150x Query Speedup
Hive 1.2
Human
Interactive
(5 seconds)
Hive on Tez Query Execution
A query execution essentially is put together from
• Client execution [ 0s if done correctly ]
• Optimization [HiveServer2] [~ 0.1s]
• Metadata lookups [Hcatalog, Metastore] [ very fast in hive 0.14 ]
• Application Master creation [4-5s]
• Container Allocation [3-5s]
• Tez task execution on YARN
YARN and HDFS
HiveServer2
Server #1Client
Running testing tool
N connections
N
connections
Metastore Metastore DB
HiveServer2
Server #2
Tez
AM
Tez
Container
Tez
Container
…
Mini Test
Mini Setup Tested
• 50 nodes
• 450B rows dataset
• Achieved 15K queries per hour
So, can we get 100K qph on 700 nodes?
We thought it should be easy, but…
8
The Bottlenecks at Scale
Challenges at Scale
• Hive Metastore Server
• YARN Resource Manager
• Datanode Hotspot
• YARN ATS
9
Hive Metastore Server
10
Use Local Metastore
• Before: HS2 -> Metastore Server -> Metastore DB
• After: HS2 (local metastore) -> Metastore DB
Hive Metastore Server
11
Use Local Metastore
• Throughput: 7.6K -> 22K qph
Pending Apps
YARN ResourceManager Scalability
• Too much pending apps
12
Pending Apps
YARN ResourceManager Scalability
• Too much pending apps
• Resolve: increase yarn.resourcemanager.amlauncher.thread-count
• Throughput: 22K -> 26K qph
13
Pending Containers
YARN ResourceManager Scalability
• Too much pending containers
14
Pending Containers
YARN ResourceManager Scalability
• Too much pending containers
• Resolve: increase tez.am-rm.heartbeat.interval-ms.max
• Throughput: 26K -> 72.5K qph
15
Datanode Hotspot
Last Hour Problem
• Connection timeout and disk access error
• Many queries access recently added data
16
Datanode Hotspot
Last Hour Problem
• Resolve: Increase HDFS replication factor
• Throughput: 72.5K -> 95K qph
17
Other Tunings
Other Tunings We Did
• Container reuse timeout
• YARN capacity scheduler node locality delay
• Tez shuffle keep alive
• TCP fin_wait
Notes on YARN ATS
• Disabling YARN ATS gives higher throughput
• Trade off losing YARN log aggregation
18
End of first half
19
End of first half
Yohei Abe
@Yahoo! JAPAN
Real-life Hive LLAP at
Yahoo! JAPAN
Aug 2016
Agenda
• Hive LLAP at Yahoo! JAPAN
• Tuning
• Performance Result
• Future Work
21
Hive LLAP at Yahoo!
JAPAN
Hive on Tez
Hive on Tez is able to
produce 100K
reports/hour
23
Hive on Tez+LLAP
How Hive on Tez+LLAP
handle 100K reports ?
• how many servers
• Tuning?
24
What is LLAP
What is LLAP?
26
LLAP is for sub-second query
procesisng
•Persistent daemons
•Caching data
What is LLAP?
27
Tez
container
Tez
container
Tez AppMaster
Tez
created dynamically
LLAP
daemon
LLAP
daemon
Tez AppMaster
Tez+LLAP
persistent daemon
Basic Tuning
LLAP test cluster
29
Server node Xeon E5-2660v2
2.2GHz / 2CPU /
128GBMEM /
10GBase-T 2port
Slave node 45 nodes
HiveServer2 node 10 nodes
Hadoop 2.7.1
Hive 2.1.0-snapshot
Tez 0.8.3
Parameters
Some basic parameters needs to be
changed
very slow performance if it’s default
value
30
Threading model
hive.llap.daemon.num.executors
31
hive.llap.io.threadpool.size
thread
executor
thread
thread
I/O
thread
data
Executor thread pool
32
hive.llap.daemon.num.executors
(default 4)
• the number of JVM thread for query
execution
• set this same with the num of vCPU
• 40 in our cpu
Performance: executor thread
33
5.49
23.68 23.42
0
5
10
15
20
25
4(default) 40(our CPU) 72
QPS
executor threads
executor threads - QPS(higher is
be er)
I/O thread pool
hive.llap.io.threadpool.size
(default 10)
• number of IO threads
• Set the number of vCPU
• 40 in our case
34
Performance: I/O thread
35
12.82
23.42
0
5
10
15
20
25
10(default) 40(out CPU)
QPS
I/O threads
I/O threads - QPS
(higher is be er)
Memory
hive.llap.daemon.memory.per.instance.mb  java
-Xmx …
36
hive.llap.io.memory.size
executor I/O
JVM on-heap JVM off-heap
Performance
(compared to Tez)
Performance: QPS
38
0
5
10
15
20
25
30
32 64  96 128 160 192 224 256 288 320 352 384
QPS
clients
LLAP
Tez
100K / hour ?
LLAP 45 nodes(test cluster)
max: 24 qps ≈ 87K query/hour
70 nodes for 100K
(if it’s scaled linearly)
39
Advanced Tuning
Advanced Tuning
41
hive.llap.client.consistent.splits
false(default) => Use file locality for
selecting LLAP daemon
true => LLAP daemon is selected evenly(by
hash distribution)
Recap: LLAP
42
A node runs LLAP
and also datanode
hive.llap.client.consistent.splits
43
17.5
23.42
0
5
10
15
20
25
false(default)
use file locality
true
QPS
hive.llap.client.consistent.splits
QPS(higher is be er)
Locality No Locality
Future Work
Web UI
Web UI (HIVE-11526)
LLAP daemon
exposes basic metrics
on port 15002(default)
Included in HIVE2.1
Contributed from
Yahoo! JAPAN
46
Web UI (HIVE-14030)
HIVE-11526 is just for each daemon
HIVE-14030 provides aggregation view of a
LLAP cluster (not yet in master)
Contributed from Yahoo! JAPAN
47
ACL
Hive Column-level ACL
49
HS2 LLAP
YARN
HDFS
GOAL: Column-level ACL
SQL
ANSWER(?):
HiveServer2 can do it
Direct Access to HDFS
breaks everything
50
HS2 LLAP
YARN
HDFS
Storage Based Authorization
M/R,
Pig,
Spark
Break
SQL
Standard
Based
ACLs !!
But direct accessing(Not from Hive)
to HDFS breaks the security model.
Other solutions
(not only Hive)
are necessary
Future Directions
51
HS2 LLAP
YARN
HDFS
LlapInputForma
t
M/R,
Pig,
Spark
Check
SQL
Based
ACLs
LlapInputFormat checks ACLs to HS2 for other applications.
HIVE-13441
HIVE-12991
see LlapDump.java
Summary
Summary
53
• Throughput is greatly
improved by LLAP
• Some tunings are necessary
• LLAP is also effective for
batch processing
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedis Labs
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 

Was ist angesagt? (20)

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 

Ähnlich wie Achieving 100k Queries per Hour on Hive on Tez

Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015polo li
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverAlex Pinkin
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short versionAlex Pinkin
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Lucidworks
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonAlluxio, Inc.
 
Monitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With ZabbixMonitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With ZabbixZabbix
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)Amazon Web Services Korea
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackTon Ngo
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMShaoshan Liu
 
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...MongoDB
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2Thomas Segismont
 

Ähnlich wie Achieving 100k Queries per Hour on Hive on Tez (20)

Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015Hadoop Robot from eBay at China Hadoop Summit 2015
Hadoop Robot from eBay at China Hadoop Summit 2015
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
SOLR Power FTW: short version
SOLR Power FTW: short versionSOLR Power FTW: short version
SOLR Power FTW: short version
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on Tachyon
 
Monitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With ZabbixMonitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With Zabbix
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBM
 
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
Partner Webinar: MongoDB and Softlayer on Bare Metal: Stability, Performance,...
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Kürzlich hochgeladen

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Achieving 100k Queries per Hour on Hive on Tez

Hinweis der Redaktion

  1. Thank you for coming today My name is yohei abe, from yahoo japan. This time, this presentation is from two people, not only me This talk is consits of two parts Both parts of talks are related to Yahoo japan, First part is from Mr. Jiang, about HIVE Tez usecase in Yahoo! Japan Mr. Jiang …. Last part is from me, about LLAP usecase, for same dataset, query. At first, Allow me to introduce myself I’m a engineer of yahoo japan, working for hadoop infrastructure systems, supporting hive, hadoop systems.
  2. Yahoo! JAPAN is the largest portal site in japan, providing many services like a weather service, auction, news and whatnot. So, our site is able to reach 81% of entire Japanese internet users, it provides advertisement place for advertisors We offer a variety of advertising solutions.
  3. YDN, yahoo display network, is one of the solutions. It uses HIVE to generate YDN report, that has how many impressions were there, were clicked, wre viewed, for a certain period of time. It contians some useful information for advertisers. The point is, data is massive, so large. The data source table, report is generated from that, has 800 billions rows over 13 months period. The report generating job is parallel workload, batch processing, not interactive query. We need to generate 100000 reports per hour, this is our business , customer requirments.
  4. From a single client machine, we run 60K queries and calculate queries per hour(qph)from a result. Throughput = 60,000 queries * (60 / minutes taken to process 60K queries) For each cluster configuration change, we have several patterns of attempt withdifferent concurrencies, 32, 64, 96, 128, 256 and bigger.
  5. Our queries were already highly optimized. So we focused on some other parts. A query execution essentially is put together from – Client execution [ 0s if done correctly ] – Optimization [HiveServer2] [~ 0.1s] – HCatalog lookups [Hcatalog, Metastore] [ very fast in hive 14 ] – Application Master creation [4-5s] – Container Allocation [3-5s] – Query Execution
  6. Pending apps decreased, but Didn’t gain too much throughput
  7. Increased tez.am-rm.heartbeat.interval-ms.max from 250ms to 1000ms
  8. Increased replication factor for specific directory from 3 to 10
  9. Ok, so LLAP. We are going to use LLAP for YDN report.
  10. As Jiang said at his talk, hive on tez can produce 100K reports per hour Our engineers found some bottlnecks and fixed them to achieve the requirement by tuning some parameters, basically.
  11. Next step is LLAP LLAP is a new hive feature from hive 2.0, So we did some technical investigation, mostly we need to know how llap can process YDN report. How many servers necessary? What parameters need to be changed? Is it possible to generate 100K reports per hour?
  12. What is LLAP
  13. I think , in other session , in previous hadoop summit, LLAP is already introduced into detail. So here, I’m going to talk just briefly about LLAP, what it does. LLAP is for sub-second query processing, the main component is the persistent daemons.
  14. Let me compare with Tez processing model, I think it’s easy way to understand the difference, what LLAP does. In the case of Tez, when the client throw a SQL, application master is created. This is same behavior with LLAP. And then, application master creates some child tez container for computation. These are created dynamically, not persistent. On the other hand, LLAP is persistent daemon. “Persistent” means it’s can be used by some queries, some users, if it is not the query using private data. Persistency provides some benefit like omitting startup cost, intelligent cache, JVM can JIT it effectively and so on. I would say again , I don’t go through the internals. So if you’re interested in that, please make sure and catches talk by core engineers of hortonworks
  15. From here, I’m going to talk about some tuning points and performance results.
  16. This is our LLAP cluster just for evaluation purpose. The important point here is 45 nodes for LLAP, it means 45 daemons are running. We also prepare hiveserver2 so as not to hiveserver2 becomes bottleneck.
  17. LLAP can be configured by xml files as well as hive. These xml files have many parameters, some of them are basic, you need to change some of them for performance. Some default values are not suitable for your system, so you need to change them
  18. These basic parameters are related to thread size This is very simplified threading model of LLAP. LLAP has two main components, executor and IO layer. IO layer reads data from HDFS, decode ORC data, convert it to vectorized data, pass them to executor Executor gets data from IO layer, and compute and generate results. These data passing is completely asynchronous.
  19. The size of executor size is specified by hive.llap.daemon.num.executors Default is 4 You need to set this value according to your CPU vcore. In our case, its 40.
  20. This chart is performance result Vertical axis is for the number of query per seconds, hight is better. Horizontal axis is for the size of executor threads size. The leftmost, default 4 is so slow, CPU is almost idle. The second bar is 40, its in our CPU vcore size. No further improvement is watched when the size is larger than CPU size. So it’s good to set this value to CPU size.
  21. Next, IO thread size can be specified by hive.llap.io.threadpool.size Default is 10, it is also too small in our case.
  22. This chart is performance result in the case of changing IO thread size Default is not good performance , its not suitable for our cpu. It’s better performance when the size is vcpu size. Following these executor and IO results, I set these values to CPU vcore size on later slides, performance test.
  23. Memory, When it comes to memory, these are two parameters. One is for executor , the other is for IO layer. One thing to note is, executore uses JVM on-heap memory, but IO layer uses off-heap memory. The value of executor memory is changed by a little bit through internal process and passed as a java command line parameter of Xmx. There is no clear guidline what value is effective for these values, in our case, split physical memory size equally and set values to them. If LLAP daemon run out of them, you can watch and find it by LLAP Web UI. I’m going to talk about it later on this slide.
  24. Performance
  25. This chart shows, the blue line is LLAP, Tez+LLAP and red line is tez Verticali axis is query per seconds, higher is better. Horizontal axis is clients, it means more clients, more concurrent queries at the same time. This chart indicates LLAP is always better performance than Tez even for batch processing, not interactive query. グラフのスケールをあわせる
  26. Is the previous chart meaning 100K per hour , we need 100k per hour performance for our Ad report. From the chart, the max qps is abou 24, it’s 87000 query per hour using 45 LLAP daemons. Almost there, it was so close, but 45 nodes in our test environment is not enough. We calculated, so if LLAP scaled almost lineary, 70 nodes is enough for 100K performance. It’s far smaller than Tez system. LLAP provide us really good performance.
  27. More tuning We found one more parameter that can be effective in our case
  28. The parameter name is client consistent splits. This takes boolean value, default is false. The difference is LLAP daemon follows data locality or not. That is, data is on the same machine with LLAP The computation may be fast when LLAP daemon uses local data instead of remote data. The default is false, Tez application master distributes computations based on file locality. True is, Tez application master uses a kind of hash distribution for selecting LLAP. It means file locality is ignored, Compute process is distributed evenly on LLAP cluster
  29. Recap: A node runs llap daemon and also datanode daemon.
  30. The resut is here. It’s a little bit , opssite result I thought. Ignoring file locality is faster than default setting. But, it depends data size, table size, and so on. We think it cannot be generarized this result, but in our case, it’s faset when I changed the value from default.
  31. We have two future work. We are now under investigating, verifying them, LLAP features
  32. The first is Web UI
  33. LLAP daemon exposes some basic metrics, memory footprint, CPU usage, cache hit ratio.. At a specific port. This feature is in Hive2.1 and contributed from Yahoo! Japan. Thank you for my co-worker. This feature is really useful for cluster administrators. For example, when you cannot get good performance even if you have modern machine, there may be some mis-configuration about LLAP. In that case, you can use this UI, how daemon works, what is cache rate. In my case, I found through the UI, the number of executor is too small. CPU was almost idle.
  34. And again, in another JIRA ticket, this UI will be improved. This is not included master branch, I think. This ticket provides you aggregated view of previous UI You can easily check status of all cluster machines, all daemons.
  35. Column-level ACL is really important for us, and I think other companies as well Of course, Hive is able to do it using HiveServer2, HIveserver2 and metastore knows which data should be exposed to who, which user
  36. But, in our environment, we are not only ussing Hive, but also using other products, like MapReduce. They breaks ACL, because they can read HDFS directly, without Hiveserver2. when you need column-level ACL, you should use only Hive. But we need othre solutions, its necessary, must be.
  37. LLAP provides a solution for this issue, It exposes LLAP as storage layer, so other products, not hive, can access it with keeping ACL. If you interested in, plese see JIRA ticket, and LlapDump.java on github, hive repository.