SlideShare a Scribd company logo
1 of 41
Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
“Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
“Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
17 Semantic Analyzer (1/2) AST QB + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN     + TOK_TABREF         …     + TOK_TABREF         …     + “=”         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
18 Semantic Analyzer (2/2) AST QB       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB     + TOK_TABNAME         +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
19 Semantic Analyzer (2/2) AST QB       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT     + TOK_SELEXPR         …      + TOK_SELEXPR         …     + TOK_SELEXPR         …     + TOK_SELEXPR         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo  + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT     + TOK_SELEXPR         + "."              + TOK_TABLE_OR_COL                  + "a"              + "user"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "a"              + "prono"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "maker"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”=     Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Plan Generator (result) 24 LCF  OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6)  StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF  Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
End 7/6/2011 37
Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3             File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)

More Related Content

What's hot

How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceDataWorks Summit
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive HookMinwoo Kim
 

What's hot (20)

How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Presto overview
Presto overviewPresto overview
Presto overview
 

Similar to Internal Hive

Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinChad Cooper
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Ilya Grigorik
 
Python 3000
Python 3000Python 3000
Python 3000Bob Chao
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Roel Hartman
 
Computer science project work
Computer science project workComputer science project work
Computer science project workrahulchamp2345
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing frameworkIndicThreads
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеYandex
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerConnor McDonald
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly LanguageMotaz Saad
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database ConnectivityRanjan Kumar
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparqlkwangsub kim
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?brynary
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkChristian Trabold
 

Similar to Internal Hive (20)

Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Hive_p
Hive_pHive_p
Hive_p
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
 
Python 3000
Python 3000Python 3000
Python 3000
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing framework
 
Code Management
Code ManagementCode Management
Code Management
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Jquery mobile
Jquery mobileJquery mobile
Jquery mobile
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
CloudKit
CloudKitCloudKit
CloudKit
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database Connectivity
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparql
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?
 
Html5
Html5Html5
Html5
 
Php
PhpPhp
Php
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase framework
 

More from Recruit Technologies

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場Recruit Technologies
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びRecruit Technologies
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Recruit Technologies
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話Recruit Technologies
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所Recruit Technologies
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Recruit Technologies
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例Recruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後Recruit Technologies
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Recruit Technologies
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するRecruit Technologies
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントRecruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルRecruit Technologies
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~Recruit Technologies
 

More from Recruit Technologies (20)

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
 
Tableau活用4年の軌跡
Tableau活用4年の軌跡Tableau活用4年の軌跡
Tableau活用4年の軌跡
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話
 
LT(自由)
LT(自由)LT(自由)
LT(自由)
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
 
リクルート式AIの活用法
リクルート式AIの活用法リクルート式AIの活用法
リクルート式AIの活用法
 
銀行ロビーアシスタント
銀行ロビーアシスタント銀行ロビーアシスタント
銀行ロビーアシスタント
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成する
 
RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Internal Hive

  • 1. Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
  • 2. Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
  • 3. Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
  • 4. Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
  • 5. Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
  • 6. Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
  • 7. Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
  • 8. Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
  • 9. Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
  • 10. Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
  • 11. “Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
  • 12. “Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
  • 13. Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
  • 14. Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 15. Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 16. Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 17. 17 Semantic Analyzer (1/2) AST QB + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN + TOK_TABREF … + TOK_TABREF … + “=” … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
  • 18. 18 Semantic Analyzer (2/2) AST QB + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB + TOK_TABNAME +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
  • 19. 19 Semantic Analyzer (2/2) AST QB + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
  • 20. 20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
  • 21. 21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 22. 22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 23. 23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”= Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 24. Logical Plan Generator (result) 24 LCF OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 25. Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
  • 26. Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
  • 27. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
  • 28. INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
  • 29. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
  • 30. 30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6) StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
  • 31. OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
  • 32. 32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 33. 33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
  • 34. 34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
  • 35. In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
  • 36. In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 38. Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
  • 39. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 40. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 41. Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)