SlideShare a Scribd company logo
1 of 41
Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
“Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
“Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY   + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“   + TOK_INSERT       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2"       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
17 Semantic Analyzer (1/2) AST QB + TOK_FROM       + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN     + TOK_TABREF         …     + TOK_TABREF         …     + “=”         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
18 Semantic Analyzer (2/2) AST QB       + TOK_DESTINATION           + TOK_TAB               + TOK_TABNAME                   + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB     + TOK_TABNAME         +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
19 Semantic Analyzer (2/2) AST QB       + TOK_SELECT           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "user"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "prono"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "maker"           + TOK_SELEXPR               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT     + TOK_SELEXPR         …      + TOK_SELEXPR         …     + TOK_SELEXPR         …     + TOK_SELEXPR         … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo  + TOK_JOIN           + TOK_TABREF               + TOK_TABNAME                   + "access_log_hbase"               + a           + TOK_TABREF               + TOK_TABNAME                   + "product_hbase"               + "p"           + "="               + "."                   + TOK_TABLE_OR_COL                       + "a"                   + "access_log_hbase"               + "."                   + TOK_TABLE_OR_COL                       + "p"                   + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT     + TOK_SELEXPR         + "."              + TOK_TABLE_OR_COL                  + "a"              + "user"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "a"              + "prono"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "maker"     + TOK_SELEXPR          + "."              + TOK_TABLE_OR_COL                  + "p"              + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”=     Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Plan Generator (result) 24 LCF  OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2  SELECT a.user, a.prono, p.maker, p.price  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono)  WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6)  StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF  Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator  Tree Logical Optimizer Operator  Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
End 7/6/2011 37
Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p             Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3             File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2     >  SELECT a.user, a.prono, p.maker, p.price     >  FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE:   (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Alias -> Map Operator Tree:         a TableScan             alias: a Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 0               value expressions: expr: user                     type: string expr: prono                     type: int         p TableScan             alias: p Reduce Output Operator               key expressions: expr: prono                     type: int               sort order: +               Map-reduce partition columns: expr: prono                     type: int               tag: 1               value expressions: expr: maker                     type: string expr: price                     type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator Reduce Operator Tree:         Join Operator           condition map:                Inner Join 0 to 1           condition expressions:             0 {VALUE._col0} {VALUE._col2}             1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7           Select Operator             expressions: expr: _col0                   type: string expr: _col2                   type: int expr: _col6                   type: string expr: _col7                   type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator               compressed: false GlobalTableId: 1               table:                   input format: org.apache.hadoop.mapred.TextInputFormat                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                   name: default.access_log_temp2   Stage: Stage-0     Move Operator       tables:           replace: true           table:               input format: org.apache.hadoop.mapred.TextInputFormat               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe               name: default.access_log_temp2   Stage: Stage-2     Stats-Aggr Operator Time taken: 0.1 seconds hive>
Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES:   Stage-1 is a root stage   Stage-0 depends on stages: Stage-1   Stage-2 depends on stages: Stage-0 STAGE PLANS:   Stage: Stage-1     Map Reduce       Map Operator Tree: TableScan             Reduce Output Operator TableScan             Reduce Output Operator       Reduce Operator Tree:         Join Operator           Select Operator             File Output Operator   Stage: Stage-0     Move Operator   Stage: Stage-2     Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)

More Related Content

What's hot

Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveJulian Hyde
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllMichael Mior
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Databricks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can helpChristian Tzolov
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQLkristinferrier
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 

What's hot (20)

Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
SQL for NoSQL and how Apache Calcite can help
SQL for NoSQL and how  Apache Calcite can helpSQL for NoSQL and how  Apache Calcite can help
SQL for NoSQL and how Apache Calcite can help
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 

Similar to Internal Hive

Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinChad Cooper
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Ilya Grigorik
 
Python 3000
Python 3000Python 3000
Python 3000Bob Chao
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Roel Hartman
 
Computer science project work
Computer science project workComputer science project work
Computer science project workrahulchamp2345
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing frameworkIndicThreads
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеYandex
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerConnor McDonald
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly LanguageMotaz Saad
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database ConnectivityRanjan Kumar
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparqlkwangsub kim
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?brynary
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkChristian Trabold
 

Similar to Internal Hive (20)

Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Hive_p
Hive_pHive_p
Hive_p
 
Python And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And PythonwinPython And GIS - Beyond Modelbuilder And Pythonwin
Python And GIS - Beyond Modelbuilder And Pythonwin
 
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
Lean & Mean Tokyo Cabinet Recipes (with Lua) - FutureRuby '09
 
Python 3000
Python 3000Python 3000
Python 3000
 
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...Developing A Real World Logistic Application With Oracle Application - UKOUG ...
Developing A Real World Logistic Application With Oracle Application - UKOUG ...
 
Computer science project work
Computer science project workComputer science project work
Computer science project work
 
Migration testing framework
Migration testing frameworkMigration testing framework
Migration testing framework
 
Code Management
Code ManagementCode Management
Code Management
 
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_ЯндексеТанки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
Танки_в_Лунапарке: нагрузочное_тестирование_в_Яндексе
 
Jquery mobile
Jquery mobileJquery mobile
Jquery mobile
 
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c OptimizerWellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
 
Introduction to Assembly Language
Introduction to Assembly LanguageIntroduction to Assembly Language
Introduction to Assembly Language
 
CloudKit
CloudKitCloudKit
CloudKit
 
JDBC Java Database Connectivity
JDBC Java Database ConnectivityJDBC Java Database Connectivity
JDBC Java Database Connectivity
 
VoCamp Seoul2009 Sparql
VoCamp Seoul2009 SparqlVoCamp Seoul2009 Sparql
VoCamp Seoul2009 Sparql
 
What's new in Rails 2?
What's new in Rails 2?What's new in Rails 2?
What's new in Rails 2?
 
Html5
Html5Html5
Html5
 
Php
PhpPhp
Php
 
TYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase frameworkTYPO3 Extension development using new Extbase framework
TYPO3 Extension development using new Extbase framework
 

More from Recruit Technologies

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場Recruit Technologies
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びRecruit Technologies
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Recruit Technologies
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話Recruit Technologies
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所Recruit Technologies
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Recruit Technologies
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例Recruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後Recruit Technologies
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Recruit Technologies
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するRecruit Technologies
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントRecruit Technologies
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントRecruit Technologies
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルRecruit Technologies
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~Recruit Technologies
 

More from Recruit Technologies (20)

新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場新卒2年目が鍛えられたコードレビュー道場
新卒2年目が鍛えられたコードレビュー道場
 
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学びカーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
カーセンサーで深層学習を使ってUX改善を行った事例とそこからの学び
 
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
Rancherを活用した開発事例の紹介 ~Rancherのメリットと辛いところ~
 
Tableau活用4年の軌跡
Tableau活用4年の軌跡Tableau活用4年の軌跡
Tableau活用4年の軌跡
 
HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話HadoopをBQにマイグレしようとしてる話
HadoopをBQにマイグレしようとしてる話
 
LT(自由)
LT(自由)LT(自由)
LT(自由)
 
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
リクルートグループの現場事例から見る AI/ディープラーニング ビジネス活用の勘所
 
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
Company Recommendation for New Graduates via Implicit Feedback Multiple Matri...
 
リクルート式AIの活用法
リクルート式AIの活用法リクルート式AIの活用法
リクルート式AIの活用法
 
銀行ロビーアシスタント
銀行ロビーアシスタント銀行ロビーアシスタント
銀行ロビーアシスタント
 
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
リクルートにおけるマルチモーダル Deep Learning Web API 開発事例
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後ユーザーからみたre:Inventのこれまでと今後
ユーザーからみたre:Inventのこれまでと今後
 
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
Struggling with BIGDATA -リクルートおけるデータサイエンス/エンジニアリング-
 
EMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成するEMRでスポットインスタンスの自動入札ツールを作成する
EMRでスポットインスタンスの自動入札ツールを作成する
 
RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)RANCHERを使ったDev(Ops)
RANCHERを使ったDev(Ops)
 
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイントリクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
リクルートにおけるセキュリティ施策方針とCSIRT組織運営のポイント
 
ユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイントユーザー企業内製CSIRTにおける対応のポイント
ユーザー企業内製CSIRTにおける対応のポイント
 
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアルリクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
リクルートテクノロジーズが語る 企業における、「AI/ディープラーニング」活用のリアル
 
「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~「リクルートデータセット」 ~公開までの道のりとこれから~
「リクルートデータセット」 ~公開までの道のりとこれから~
 

Recently uploaded

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Internal Hive

  • 1. Inside Hive (for beginners) 1 Takeshi NAKANO / Recruit Co. Ltd.
  • 2. Why? Hive is good tool for non-specialist! The number of M/R controls the Hive processing time. ↓ How can we reduce the number? What can we do for this on writing HiveQL? ↓ How does Hive convert HiveQLto M/R jobs? On this, what optimizing processes are adopted? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 2
  • 3. Don’t you have.. This fb’s paper has a lot of information! But this is a little old.. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 3
  • 4. Component Level Analysis 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 4
  • 5. Hive Architecture / Exec Flow 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 5 Client Hadoop Metastore Driver Compiler
  • 6. Client Hadoop Driver Compiler Hive Workflow Hive has the operators which are minimum processing units. The process of each operator is done with HDFS operation or M/R jobs. The compiler converts HiveQL to the sets of operators. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 6 Metastore
  • 7. Hive Workflow Operators 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 7
  • 8. Client Hadoop Metastore Driver Compiler Hive Workflow For M/R processing, Hiveuses ExecMaper and ExecReducer. On processing, we have 2 modes. Local processing mode Distributed processing mode 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 8
  • 9. Client Hadoop Metastore Driver Compiler Hive Workflow On 1(Local mode)Hive fork the process with hadoop command.The plan.xml is made just on 1 and the single node processes this. On 2(Distributed mode).Hive send the process to exsistingJobTracker.The information is housed on DistributedCacheand processed on multi nodes. 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 9
  • 10. Compiler : How to Process HiveQL 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 10 Client Hadoop Metastore Driver Compiler
  • 11. “Plumbing” of HIVE compiler 7/6/2011 11 HIVE - A warehouse solution over Map Reduce Framework
  • 12. “Plumbing” of HIVE compiler 7/6/2011 12 HIVE - A warehouse solution over Map Reduce Framework
  • 13. Compiler Overview 13 Parser Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer
  • 14. Compiler Overview 14 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 15. Parser Hive QL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); Hive QL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 16. Parser SQL AST INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); SQL TOK_QUERY + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ + TOK_INSERT + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2" + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST 1 2 3 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 17. 17 Semantic Analyzer (1/2) AST QB + TOK_FROM + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ AST 1 QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) ParseInfo Join Node + TOK_JOIN + TOK_TABREF … + TOK_TABREF … + “=” … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 17
  • 18. 18 Semantic Analyzer (2/2) AST QB + TOK_DESTINATION + TOK_TAB + TOK_TABNAME + "access_log_temp2” AST 2 QB ParseInfo NameTo Destination Node + TOK_TAB + TOK_TABNAME +"access_log_temp2” Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 18 18
  • 19. 19 Semantic Analyzer (2/2) AST QB + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" AST QB ParseInfo 3 Name To Select Node + TOK_SELECT + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … + TOK_SELEXPR … Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 19 19
  • 20. 20 Logical Plan Generator (1/4) QB OP Tree QB MetaData AliasTo Table Info “a”=Table Info(“access_log_hbase”) “p”=Table Info(“product_hbase”) OP Tree TableScanOperator(“access_log_hbase”) TableScanOperator(“product_hbase”) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 20 20
  • 21. 21 Logical Plan Generator (2/4) QB OP Tree QB ParseInfo + TOK_JOIN + TOK_TABREF + TOK_TABNAME + "access_log_hbase" + a + TOK_TABREF + TOK_TABNAME + "product_hbase" + "p" + "=" + "." + TOK_TABLE_OR_COL + "a" + "access_log_hbase" + "." + TOK_TABLE_OR_COL + "p" + "prono“ ReduceSinkOperator(“access_log_hbase”) ReduceSinkOperator(“product_hbase”) OP Tree JoinOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 22. 22 Logical Plan Generator (3/4) QB OP Tree QB ParseInfo Name To Select Node + TOK_SELECT + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "user" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "a" + "prono" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "maker" + TOK_SELEXPR + "." + TOK_TABLE_OR_COL + "p" + "price" OP Tree SelectOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 23. 23 Logical Plan Generator (4/4) QB OP Tree QB MetaData Name To Destination Table Info “insclause-0”= Table Info(“access_log_temp2”) OP Tree FileSinkOperator Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 24. Logical Plan Generator (result) 24 LCF OP Tree TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 25. Logical Optimizer Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 25 25 25
  • 26. Logical Optimizer (Predicate Push Down) INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 26 26
  • 27. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 27 27
  • 28. INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_3 ReduceSinkOperator RS_2 JoinOperator JOIN_4 FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 28 28
  • 29. Logical Optimizer (Predicate Push Down) TableScanOperator TS_1 TableScanOperator TS_0 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); FilterOperator FIL_8 (maker = 'honda') ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 JoinOperator JOIN_4 INSERT OVERWRITE TABLE access_log_temp2 SELECT a.user, a.prono, p.maker, p.price FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono) WHERE p.maker = 'honda'; FilterOperator FIL_5 (_col8 = 'honda') SelectOperator SEL_6 FileSinkOperator FS_7 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 29 29
  • 30. 30 Physical Plan Generator OP Tree Task Tree MoveTask(Stage-0) Ope Tree LoadTableDesc TableScanOperator(TS_0) TableScanOperator(TS_1) ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) JoinOperator(JOIN_4) SelectOperator(SEL_5) FileSinkOperator(FS_6) StatsTask(Stage-2) Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 30 30
  • 31. OP Tree Task Tree MapRedTask (Stage-1/root) TableScanOperator(TS_0) Physical Plan Generator (result) 31 LCF Mapper TableScanOperator TS_1 TableScanOperator TS_0 TableScanOperator(TS_1) ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 ReduceSinkOperator(RS_2) MapRedTask(Stage-1/root) ReduceSinkOperator(RS_3) Reducer JoinOperator JOIN_4 JoinOperator(JOIN_4) SelectOperator SEL_5 SelectOperator(SEL_5) FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 31 31 31
  • 32. 32 Physical Optimizer Task Tree Task Tree java/org/apache/hadoop/hive/ql/optimizer/physical/以下 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser
  • 33. 33 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapRedTask (Stage-1) Mapper TableScanOperator TS_1 TableScanOperator TS_0 MapJoinOperator MAPJOIN_7 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 33
  • 34. 34 Physical Optimizer (MapJoinResolver) Task Tree Task Tree MapredLocalTask(Stage-7) MapRedTask (Stage-1) TableScanOperator TS_0 Mapper TableScanOperator TS_1 TableScanOperator TS_0 HashTableSinkOperator HASHTABLESINK_11 MapJoinOperator MAPJOIN_7 MapRedTask (Stage-1) SelectOperator SEL_8 Mapper TableScanOperator TS_1 SelectOperator SEL_5 MapJoinOperator MAPJOIN_7 FileSinkOperator FS_6 SelectOperator SEL_8 SelectOperator SEL_5 FileSinkOperator FS_6 Semantic Analyzer Logical Plan Gen. Logical Optimizer Physical Plan Gen. Physical Optimizer Parser 34
  • 35. In the end 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 35 Client Hadoop Metastore Driver Compiler
  • 36. In the end 36 Hive QL Parser AST Semantic Analyzer QB Logical Plan Gen. Operator Tree Logical Optimizer Operator Tree Physical Plan Gen. Task Tree Physical Optimizer Task Tree
  • 38. Appendix: What does Explain show? 7/6/2011 HIVE - A warehouse solution over Map Reduce Framework 38
  • 39. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 40. Appendix: What does Explain show? hive> explain INSERT OVERWRITE TABLE access_log_temp2 > SELECT a.user, a.prono, p.maker, p.price > FROM access_log_hbase a JOIN product_hbase p ON (a.prono = p.prono); OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF (TOK_TABNAME access_log_hbase) a) (TOK_TABREF (TOK_TABNAME product_hbase) p) (= (. (TOK_TABLE_OR_COL a) prono) (. (TOK_TABLE_OR_COL p) prono)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB (TOK_TABNAME access_log_temp2))) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) user)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) prono)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) maker)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL p) price))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: a TableScan alias: a Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 0 value expressions: expr: user type: string expr: prono type: int p TableScan alias: p Reduce Output Operator key expressions: expr: prono type: int sort order: + Map-reduce partition columns: expr: prono type: int tag: 1 value expressions: expr: maker type: string expr: price type: int ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col2} 1 {VALUE._col1} {VALUE._col2} handleSkewJoin: false outputColumnNames: _col0, _col2, _col6, _col7 Select Operator expressions: expr: _col0 type: string expr: _col2 type: int expr: _col6 type: string expr: _col7 type: int outputColumnNames: _col0, _col1, _col2, _col3 File Output Operator compressed: false GlobalTableId: 1 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-0 Move Operator tables: replace: true table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe name: default.access_log_temp2 Stage: Stage-2 Stats-Aggr Operator Time taken: 0.1 seconds hive>
  • 41. Appendix: What does Explain show? ABSTRACT SYNTAX TREE: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 Stage-2 depends on stages: Stage-0 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan Reduce Output Operator TableScan Reduce Output Operator Reduce Operator Tree: Join Operator Select Operator File Output Operator Stage: Stage-0 Move Operator Stage: Stage-2 Stats-Aggr Operator MapRedTask (Stage-1/root) Mapper TableScanOperator TS_1 TableScanOperator TS_0 ReduceSinkOperator RS_2 ReduceSinkOperator RS_3 Reducer JoinOperator JOIN_4 ≒ SelectOperator SEL_5 FileSinkOperator FS_6 MoveTask (Stage-0) Stats Task (Stage-2)