SlideShare ist ein Scribd-Unternehmen logo
1 von 273
Downloaden Sie, um offline zu lesen
SDEC 2011
               Seoul Data Engineering Camp
                                             June 27-28
                                             Seoul, South Korea
Replacing Legacy Telco DB/DW
                    to Hadoop and Hive


                        JunHo Cho
                           NexR
Agenda
Agenda

           • Motivation for Hive and Hadoop
Agenda

           • Motivation for Hive and Hadoop
           • Hive Internal
Agenda

           • Motivation for Hive and Hadoop
           • Hive Internal
           • Oracle Migration UseCase
Agenda

           • Motivation for Hive and Hadoop
           • Hive Internal
           • Oracle Migration UseCase
           • Hive Optimization
Agenda

           • Motivation for Hive and Hadoop
           • Hive Internal
           • Oracle Migration UseCase
           • Hive Optimization
           • Future Work
Telco Data
Telco Data
Telco Data
Telco Data
Telco Data
Telco Data
Telco Data
Telco Data
qu er
                            Co n
                     de &
               Di vi
OpenSource
OpenSource


               Storage & Computing
OpenSource
OpenSource




     Collection
OpenSource
OpenSource

     Search
OpenSource
OpenSource


                            Analysis
OpenSource
OpenSource




                 Coordination
OpenSource
What is HIVE ?
What is HIVE ?
          •    A system for managing and querying structured data
               built on top of Hadoop
               •   Map-Reduce for execution
               •   HDFS for storage
               •   Metadata in an RDBMS
What is HIVE ?
          •    A system for managing and querying structured data
               built on top of Hadoop
               •   Map-Reduce for execution
               •   HDFS for storage
               •   Metadata in an RDBMS



          •    Key Building Principles
               •   SQL is a familiar language
               •   Extensibility - Types, Functions, Formats, Scripts
               •   Performance
Why Hive ?
Count call-record per phone ?
public class CallCountMapper extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(WritableComparable key, Writable value,
        OutputCollector output, Reporter reporter) throws IOException {

             String line = value.toString();
             StringTokenizer itr = new StringTokenizer(line.toLowerCase());
             word.set(itr.nextToken());
             output.collect(word, one);
    }
}
public class CallCountMapper extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final IntWritable one = new IntWritable(1);




                                 er
    private Text word = new Text();




                             app
    public void map(WritableComparable key, Writable value,



                            M
        OutputCollector output, Reporter reporter) throws IOException {

             String line = value.toString();
             StringTokenizer itr = new StringTokenizer(line.toLowerCase());
             word.set(itr.nextToken());
             output.collect(word, one);
    }
}
public class CallCountMapper extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final IntWritable one = new IntWritable(1);




                                 er
    private Text word = new Text();




                             app
    public void map(WritableComparable key, Writable value,



                            M
        OutputCollector output, Reporter reporter) throws IOException {

             String line = value.toString();
             StringTokenizer itr = new StringTokenizer(line.toLowerCase());
             word.set(itr.nextToken());
             output.collect(word, one);
    }
}




public class CallCountReducer extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterator values,
        OutputCollector output, Reporter reporter) throws IOException {

         int sum = 0;
         while (values.hasNext()) {
           IntWritable value = (IntWritable) values.next();
           sum += value.get(); // process value
         }

         output.collect(key, new IntWritable(sum));
    }
}
public class CallCountMapper extends MapReduceBase
    implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final IntWritable one = new IntWritable(1);




                                 er
    private Text word = new Text();




                             app
    public void map(WritableComparable key, Writable value,



                            M
        OutputCollector output, Reporter reporter) throws IOException {

             String line = value.toString();
             StringTokenizer itr = new StringTokenizer(line.toLowerCase());
             word.set(itr.nextToken());
             output.collect(word, one);
    }
}




public class CallCountReducer extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterator values,




                                  er
        OutputCollector output, Reporter reporter) throws IOException {




                               uc
         int sum = 0;



                             ed
         while (values.hasNext()) {



                           R
           IntWritable value = (IntWritable) values.next();
           sum += value.get(); // process value
         }

         output.collect(key, new IntWritable(sum));
    }
}
public class CallCountMapper extends MapReduceBase                            public class CallCount {
    implements Mapper<LongWritable, Text, Text, IntWritable> {
                                                                                  public static void main(String[] args) {
    private final IntWritable one = new IntWritable(1);                             JobClient client = new JobClient();




                                 er
    private Text word = new Text();                                                 JobConf conf = new JobConf(WordCount.class);




                             app
    public void map(WritableComparable key, Writable value,                           // specify output types



                            M
        OutputCollector output, Reporter reporter) throws IOException {               conf.setOutputKeyClass(Text.class);
                                                                                      conf.setOutputValueClass(IntWritable.class);
             String line = value.toString();
             StringTokenizer itr = new StringTokenizer(line.toLowerCase());           // specify input and output dirs
             word.set(itr.nextToken());                                               FileInputPath.addInputPath(conf, new Path("input"));
             output.collect(word, one);                                               FileOutputPath.addOutputPath(conf, new Path("output"));
    }
}                                                                                     // specify a mapper
                                                                                      conf.setMapperClass(KeyCountMapper.class);

                                                                                      // specify a reducer
                                                                                      conf.setReducerClass(CallCountReducer.class);
                                                                                      conf.setCombinerClass(CallCountReducer.class);
public class CallCountReducer extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {                        client.setConf(conf);
                                                                                      try {
    public void reduce(Text key, Iterator values,                                       JobClient.runJob(conf);




                                  er
        OutputCollector output, Reporter reporter) throws IOException {               } catch (Exception e) {




                               uc
                                                                                        e.printStackTrace();
         int sum = 0;                                                                 }



                             ed
         while (values.hasNext()) {                                               }



                           R
           IntWritable value = (IntWritable) values.next();                   }
           sum += value.get(); // process value
         }

         output.collect(key, new IntWritable(sum));
    }
}
public class CallCountMapper extends MapReduceBase                            public class CallCount {
    implements Mapper<LongWritable, Text, Text, IntWritable> {
                                                                                  public static void main(String[] args) {
    private final IntWritable one = new IntWritable(1);                             JobClient client = new JobClient();




                                 er
    private Text word = new Text();                                                 JobConf conf = new JobConf(WordCount.class);




                             app
    public void map(WritableComparable key, Writable value,                           // specify output types



                            M
        OutputCollector output, Reporter reporter) throws IOException {               conf.setOutputKeyClass(Text.class);
                                                                                      conf.setOutputValueClass(IntWritable.class);
             String line = value.toString();



                                                                                                            er
             StringTokenizer itr = new StringTokenizer(line.toLowerCase());           // specify input and output dirs




                                                                                                         riv
             word.set(itr.nextToken());                                               FileInputPath.addInputPath(conf, new Path("input"));
             output.collect(word, one);                                               FileOutputPath.addOutputPath(conf, new Path("output"));



                                                                                                       D
    }
}                                                                                     // specify a mapper
                                                                                      conf.setMapperClass(KeyCountMapper.class);

                                                                                      // specify a reducer
                                                                                      conf.setReducerClass(CallCountReducer.class);
                                                                                      conf.setCombinerClass(CallCountReducer.class);
public class CallCountReducer extends MapReduceBase
    implements Reducer<Text, IntWritable, Text, IntWritable> {                        client.setConf(conf);
                                                                                      try {
    public void reduce(Text key, Iterator values,                                       JobClient.runJob(conf);




                                  er
        OutputCollector output, Reporter reporter) throws IOException {               } catch (Exception e) {




                               uc
                                                                                        e.printStackTrace();
         int sum = 0;                                                                 }



                             ed
         while (values.hasNext()) {                                               }



                           R
           IntWritable value = (IntWritable) values.next();                   }
           sum += value.get(); // process value
         }

         output.collect(key, new IntWritable(sum));
    }
}
SELECT pnum, count(pnum)
               FROM cdr
               GROUP BY pnum;
History of Hive
          •     Hive development cycle is fast and the developer
                community is growing rapidly

               •    Product release cycle is accelerating




               Project
               started        0.3.0   0.4.0   0.5.0   0.6.0   0.7.0   0.7.1


                03/08          4/09   12/09   02/10   10/10   03/11   06/11
History of Hive
          •     Hive development cycle is fast and the developer
                community is growing rapidly

               •    Product release cycle is accelerating




               Project
               started        0.3.0   0.4.0   0.5.0   0.6.0   0.7.0   0.7.1


                03/08          4/09   12/09   02/10   10/10   03/11   06/11
History of Hive
          •     Hive development cycle is fast and the developer
                community is growing rapidly

               •    Product release cycle is accelerating




               Project
               started        0.3.0   0.4.0   0.5.0   0.6.0   0.7.0   0.7.1


                03/08          4/09   12/09   02/10   10/10   03/11   06/11
Who use Hive?




               http://wiki.apache.org/hadoop/Hive/PoweredBy
UseCase in Hive?
UseCase in Hive?
               • Report and ad hoc query
UseCase in Hive?
               • Report and ad hoc query
               • Log Analysis
UseCase in Hive?
               • Report and ad hoc query
               • Log Analysis
               • Social Graph Analysis
UseCase in Hive?
               • Report and ad hoc query
               • Log Analysis
               • Social Graph Analysis
               • Data mining and analysis
UseCase in Hive?
               • Report and ad hoc query
               • Log Analysis
               • Social Graph Analysis
               • Data mining and analysis
               • Machine Learning
UseCase in Hive?
               • Report and ad hoc query
               • Log Analysis
               • Social Graph Analysis
               • Data mining and analysis
               • Machine Learning
               • Dataset cleaning
UseCase in Hive?
               • Report and ad hoc query
               • Log Analysis
               • Social Graph Analysis
               • Data mining and analysis
               • Machine Learning
               • Dataset cleaning
               • Data Warehouse
Hive Architecture

               UI         Driver

               DDL           HQL
                                           Execution
                                     Works
                                            Engine
     MetaStore            Compiler
                    ORM                     Hadoop
                                     Result
Hive Architecture

               UI         Driver   select col1 from tab1 where ...


               DDL           HQL
                                            Execution
                                      Works
                                             Engine
     MetaStore            Compiler
                    ORM                       Hadoop
                                       Result
Hive Architecture

               UI         Driver

               DDL           HQL
                                           Execution
                                     Works
                                            Engine
     MetaStore            Compiler
                    ORM                     Hadoop
                                     Result
Hive Architecture

               UI         Driver

               DDL           HQL
                                           Execution
                                     Works
                                            Engine
     MetaStore            Compiler
                    ORM                     Hadoop
                                     Result
Hive Architecture

               UI         Driver

               DDL           HQL
                                           Execution
                                     Works
                                            Engine
     MetaStore            Compiler
                    ORM                     Hadoop
                                     Result
Hive Architecture
                            a 123344
                            b 121211
                            c 342434

               UI         Driver

               DDL            HQL
                                             Execution
                                       Works
                                              Engine
     MetaStore            Compiler
                    ORM                       Hadoop
                                       Result
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Parser
                 Parser
                                             Select col1,col2 From tab1 Where col3 > 5


                                    TOK_QUERY




          TOK_FROM                                    TOK_INSERT




                          TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

     TOK_TABNAME
                                            TOK_SELEXPR                    TOK_SELEXPR


                             TOK_DIR
                                                                                                      >


                                          TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                           TOK_TMP_FILE
                                                                                         TOK_TABLE_OR_COL    5
Parser
                 Parser
                                               Select col1,col2 From tab1 Where col3 > 5

                                          QB
                                    TOK_QUERY




          TOK_FROM                                    TOK_INSERT




                          TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

     TOK_TABNAME
                                            TOK_SELEXPR                    TOK_SELEXPR


                             TOK_DIR
                                                                                                      >


                                          TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                           TOK_TMP_FILE
                                                                                         TOK_TABLE_OR_COL    5
Parser
                 Parser
                                             Select col1,col2 From tab1 Where col3 > 5


                                    TOK_QUERY




          TOK_FROM                                    TOK_INSERT




                          TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

     TOK_TABNAME
                                            TOK_SELEXPR                    TOK_SELEXPR
          QB tab1
                             TOK_DIR
                                                                                                      >


                                          TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                           TOK_TMP_FILE
                                                                                         TOK_TABLE_OR_COL    5
Parser
                 Parser
                                             Select col1,col2 From tab1 Where col3 > 5


                                    TOK_QUERY




          TOK_FROM                                    TOK_INSERT




                          TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

     TOK_TABNAME
                                            TOK_SELEXPR                    TOK_SELEXPR
                 tab1
                             TOK_DIR
                                                                                                      >


                                          TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                           TOK_TMP_FILE
                                                                                         TOK_TABLE_OR_COL    5

                             QB     insclause-0
Parser
                 Parser
                                             Select col1,col2 From tab1 Where col3 > 5


                                    TOK_QUERY




          TOK_FROM                                    TOK_INSERT




                          TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

     TOK_TABNAME
                                            TOK_SELEXPR                    TOK_SELEXPR
                 tab1
                             TOK_DIR
                                                                                                      >


                                          TOK_TABLE_OR_COL                TOK_TABLE_OR_COL



                           TOK_TMP_FILE
                                                 col1 QB
                                                                                         TOK_TABLE_OR_COL    5

                                    insclause-0
Parser
                 Parser
                                             Select col1,col2 From tab1 Where col3 > 5


                                    TOK_QUERY




          TOK_FROM                                    TOK_INSERT




                          TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE

     TOK_TABNAME
                                            TOK_SELEXPR                    TOK_SELEXPR
                 tab1
                             TOK_DIR
                                                                                                      >


                                          TOK_TABLE_OR_COL                TOK_TABLE_OR_COL


                                                 col1                      col2           QB
                           TOK_TMP_FILE
                                                                                         TOK_TABLE_OR_COL    5

                                    insclause-0
Parser
                 Parser
                                             Select col1,col2 From tab1 Where col3 > 5


                                    TOK_QUERY




          TOK_FROM                                    TOK_INSERT




                          TOK_DESTINATION                    TOK_SELECT                          TOK_WHERE   QB
     TOK_TABNAME
                                            TOK_SELEXPR                    TOK_SELEXPR
                 tab1
                             TOK_DIR
                                                                                                      >


                                          TOK_TABLE_OR_COL                TOK_TABLE_OR_COL


                                                 col1                      col2
                           TOK_TMP_FILE
                                                                                         TOK_TABLE_OR_COL         5

                                    insclause-0
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM

     TOK_WHERE

     TOK_SELECT

     TOK_DESTINATION
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM                            TableScanOperator

     TOK_WHERE

     TOK_SELECT

     TOK_DESTINATION
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM                            TableScanOperator

     TOK_WHERE

     TOK_SELECT

     TOK_DESTINATION
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM                            TableScanOperator

     TOK_WHERE                             FilterOperator

     TOK_SELECT

     TOK_DESTINATION
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM                            TableScanOperator

     TOK_WHERE                             FilterOperator

     TOK_SELECT

     TOK_DESTINATION
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM                            TableScanOperator

     TOK_WHERE                             FilterOperator

     TOK_SELECT                            SelectOperator

     TOK_DESTINATION
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM                            TableScanOperator

     TOK_WHERE                             FilterOperator

     TOK_SELECT                            SelectOperator

     TOK_DESTINATION
Plan
                 Plan
                        Select col1,col2 From tab1 Where col3 > 5


                 QB



     TOK_FROM                            TableScanOperator

     TOK_WHERE                             FilterOperator

     TOK_SELECT                            SelectOperator

     TOK_DESTINATION                      FileSinkOperator
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Optimizer
             Optimizer   Select col1,col2 From tab1 Where col3 > 5




                         TableScanOperator

                           FilterOperator

                           SelectOperator

                          FileSinkOperator
Optimizer
             Optimizer   Select col1,col2 From tab1 Where col3 > 5

                         tab1 {col1, col2, col3, col4,col5,col6,col7}



                         TableScanOperator

                           FilterOperator

                           SelectOperator

                          FileSinkOperator
Optimizer
               Optimizer   Select col1,col2 From tab1 Where col3 > 5

                           tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

  FilterOperator

 SelectOperator

FileSinkOperator
Optimizer
               Optimizer   Select col1,col2 From tab1 Where col3 > 5

                           tab1 {col1, col2, col3, col4,col5,col6,col7}
                                  Context




TableScanOperator

  FilterOperator
                                                  ColumnPruner


 SelectOperator

FileSinkOperator
Optimizer
               Optimizer   Select col1,col2 From tab1 Where col3 > 5

                           tab1 {col1, col2, col3, col4,col5,col6,col7}
                                  Context




TableScanOperator

  FilterOperator                                                   FIL
                                                  ColumnPruner     TS
                                                                   SEL
 SelectOperator

FileSinkOperator
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

  FilterOperator                                                     FIL
                                                    ColumnPruner     TS
                                                                     SEL
 SelectOperator

FileSinkOperator           Context
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

  FilterOperator
                                                    ColumnPruner


 SelectOperator
                                     FIL
FileSinkOperator           Context   TS
                                     SEL
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

  FilterOperator
                                                    ColumnPruner

                                     FIL
 SelectOperator            Context   TS
                                     SEL

FileSinkOperator
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

  FilterOperator
                                                        ColumnPruner

                                     FIL
 SelectOperator            Context   TS
                                     SEL   col1, col2

FileSinkOperator
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

  FilterOperator
                                                    ColumnPruner

                                     FIL
 SelectOperator            Context   TS
                                     SEL

FileSinkOperator
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

                                     FIL   col1, col2, col3
  FilterOperator           Context   TS
                                                              ColumnPruner
                                     SEL

 SelectOperator

FileSinkOperator
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}



TableScanOperator

                                     FIL
  FilterOperator           Context   TS
                                                    ColumnPruner
                                     SEL

 SelectOperator

FileSinkOperator
Optimizer
               Optimizer     Select col1,col2 From tab1 Where col3 > 5

                             tab1 {col1, col2, col3, col4,col5,col6,col7}


                                     FIL
TableScanOperator          Context   TS    col1, col2, col3
                                     SEL

       FilterOperator
                                                              ColumnPruner


  FilterOperator

 SelectOperator

FileSinkOperator
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Hive Internal
                                                      Map Reduce
           Web UI     Hive CLI      JDBC
                                              TSOperator           User Script
                 Browse, Query, DDL
                                                                   UDF/UDAF
                                              SELOperator
                                                                     substr
                                                                      sum
 MetaStore                       Hive QL      FSOperator            average

     Thrift API                  Parser          ExecMapper/ExecReducer
                                  Plan                     SerDe

                             Optimizer           Input/OutputFormat

                                  Task
                                             HDFS             StorageHandler
                                             RCFile
                                                            DB     ...     HBase
Task
                 Task   Select col1,col2 From tab1 Where col3 > 5

                                               TS - GenMRTableScan1
                             TaskFactory
                                               FS - GenMRFileSink1
               QB
Task
                 Task   Select col1,col2 From tab1 Where col3 > 5

                                               TS - GenMRTableScan1
                             TaskFactory
                                               FS - GenMRFileSink1
               QB




                                               FetchTask
Task
                    Task      Select col1,col2 From tab1 Where col3 > 5

                                                     TS - GenMRTableScan1
                                   TaskFactory
                                                     FS - GenMRFileSink1
                  QB



          TableScanOperator



           FilterOperator                            FetchTask

           FilterOperator



           SelectOperator



          FileSinkOperator
Task
                   Task      Select col1,col2 From tab1 Where col3 > 5

                                                    TS - GenMRTableScan1
                                   TaskFactory
                                                    FS - GenMRFileSink1
                 QB



                                TableScanOperator



          FilterOperator                            FetchTask

          FilterOperator



          SelectOperator



          FileSinkOperator
Task
                   Task      Select col1,col2 From tab1 Where col3 > 5


                                   TaskFactory
                                                    FS - GenMRFileSink1
                 QB
                                 MapRedTask


                                TableScanOperator



          FilterOperator                            FetchTask

          FilterOperator



          SelectOperator



          FileSinkOperator
Task
                   Task      Select col1,col2 From tab1 Where col3 > 5


                                   TaskFactory
                                                    FS - GenMRFileSink1
                 QB
                                 MapRedTask


                                TableScanOperator



                                FilterOperator      FetchTask

          FilterOperator



          SelectOperator



          FileSinkOperator
Task
                   Task      Select col1,col2 From tab1 Where col3 > 5


                                   TaskFactory
                                                    FS - GenMRFileSink1
                 QB
                                 MapRedTask


                                TableScanOperator



                                FilterOperator      FetchTask

                                 FilterOperator



          SelectOperator



          FileSinkOperator
Task
                   Task      Select col1,col2 From tab1 Where col3 > 5


                                   TaskFactory
                                                    FS - GenMRFileSink1
                 QB
                                 MapRedTask


                                TableScanOperator



                                FilterOperator      FetchTask

                                 FilterOperator



                                 SelectOperator



          FileSinkOperator
Task
                 Task   Select col1,col2 From tab1 Where col3 > 5


                              TaskFactory
                                               FS - GenMRFileSink1
               QB
                            MapRedTask


                           TableScanOperator



                           FilterOperator      FetchTask

                            FilterOperator



                            SelectOperator



                           FileSinkOperator
Task
                 Task   Select col1,col2 From tab1 Where col3 > 5


                              TaskFactory

               QB
                            MapRedTask


                           TableScanOperator



                           FilterOperator      FetchTask

                            FilterOperator



                            SelectOperator



                           FileSinkOperator
Task
                 Task   Select col1,col2 From tab1 Where col3 > 5


                              TaskFactory

               QB
                            MapRedTask
                                               MapRedTask
                           TableScanOperator



                           FilterOperator       FetchTask

                            FilterOperator



                            SelectOperator



                           FileSinkOperator
Hive Internal
                                                          Map Reduce
           Web UI     Hive CLI      JDBC
                                            TSOperator                     User Script
                 Browse, Query, DDL
                                                                              UDF
                                            FILOperator    SELOperator


 MetaStore                       Hive QL    FILOperator     FSOperator

     Thrift API                  Parser             ExecMapper/ExecReducer
                                  Plan                      SerDe

                             Optimizer              Input/OutputFormat

                                  Task
                                               HDFS            StorageHandler
                                                RCFile
                                                             DB      ...       HBase
Hive Internal
                                                          Map Reduce
           Web UI     Hive CLI      JDBC
                                            TSOperator                     User Script
                 Browse, Query, DDL
                                                                              UDF
                                            FILOperator    SELOperator


 MetaStore                       Hive QL    FILOperator     FSOperator

     Thrift API                  Parser             ExecMapper/ExecReducer
                                  Plan                      SerDe

                             Optimizer              Input/OutputFormat

                                  Task
                                               HDFS            StorageHandler
                                                RCFile
                                                             DB      ...       HBase
Oracle Migration
                    to Hive
l	 
l	 

l	 

l	            	 

l
l	                     l	 
l	                     l	 
l	                     l	    	 
l	            	        l	 
l	             	       l
l	                     l	 
l	                     l	 
l	                     l	    	 
l	            	        l	 
l	             	       l
Oracle SQL
Data Model
     Hive Entity   Sample       HDFS LOC
Data Model
     Hive Entity       Sample       HDFS LOC
               Table
Data Model
     Hive Entity       Sample       HDFS LOC
               Table     Log          /hive/Log
Data Model
     Hive Entity       Sample       HDFS LOC
               Table     Log          /hive/Log

          Partition
Data Model
     Hive Entity       Sample       HDFS LOC
               Table     Log            /hive/Log

          Partition    time=hour    /hive/Log/time=1h
Data Model
     Hive Entity        Sample       HDFS LOC
               Table      Log            /hive/Log

          Partition     time=hour    /hive/Log/time=1h


               Bucket
Data Model
     Hive Entity        Sample         HDFS LOC
               Table      Log              /hive/Log

          Partition     time=hour     /hive/Log/time=1h

                                       /wh/Log/time=1h/
               Bucket   phone-num
                                    part-$hash(phone-num)
Data Model
     Hive Entity        Sample         HDFS LOC
               Table      Log              /hive/Log

          Partition     time=hour     /hive/Log/time=1h

                                       /wh/Log/time=1h/
               Bucket   phone-num
                                    part-$hash(phone-num)

           External
            Table
Data Model
     Hive Entity        Sample         HDFS LOC
               Table      Log              /hive/Log

          Partition     time=hour      /hive/Log/time=1h

                                       /wh/Log/time=1h/
               Bucket   phone-num
                                    part-$hash(phone-num)

           External                      /app/meta/dir
                        customer      (arbitrary location)
            Table
Data Model
                  MetaStore                             HDFS


                     Table
               Data Location                               Partition
               Bucketing Info
               Partitioning Info
                                                              part-001



                                                           Bucket

                                           Partition
                 MetaStore DB
                                          /hive/Log
                                          /hive/Log/time=1h
                                          /hive/Log/time=1h/part-0001
Column Data Types
Column Data Types
               • Primitive Types
                • int type : tinyint, smallint, int, bigint
                • boolean, float, double, string
Column Data Types
               • Primitive Types
                • int type : tinyint, smallint, int, bigint
                • boolean, float, double, string
               • Nest-able Collections
                • array : value(any-type)
                • map : key(primitive) and value(any-type)
Column Data Types
               • Primitive Types
                • int type : tinyint, smallint, int, bigint
                • boolean, float, double, string
               • Nest-able Collections
                • array : value(any-type)
                • map : key(primitive) and value(any-type)
               • User-defined types
                • structures with attributes
DataType Convert
DataType Convert


          NUMBER(n)
DataType Convert


          NUMBER(n)        TINYINT
                         INT/BIGINT
DataType Convert


          NUMBER(n)         TINYINT
                          INT/BIGINT

          NUMBER(n,m)
DataType Convert


          NUMBER(n)         TINYINT
                          INT/BIGINT

          NUMBER(n,m)    FLOAT/DOUBLE
DataType Convert


          NUMBER(n)         TINYINT
                          INT/BIGINT

          NUMBER(n,m)    FLOAT/DOUBLE

          VARCHAR2
DataType Convert


          NUMBER(n)         TINYINT
                          INT/BIGINT

          NUMBER(n,m)    FLOAT/DOUBLE

          VARCHAR2          STRING
DataType Convert


          NUMBER(n)           TINYINT
                            INT/BIGINT

          NUMBER(n,m)      FLOAT/DOUBLE

          VARCHAR2            STRING

               DATE
DataType Convert


          NUMBER(n)             TINYINT
                              INT/BIGINT

          NUMBER(n,m)       FLOAT/DOUBLE

          VARCHAR2               STRING

               DATE               STRING
                           “yyyy-MM-dd HH:mm:ss” format
Oracle DML


               • HIVE supports ANSI-SQL
               • Sub-Queries in FROM clause
               • Join query : equi-join/inner-join , outer-join
Range Operator
Range Operator
     BETWEEN ~ AND ~
Range Operator
     BETWEEN ~ AND ~
          SELECT * from Employee WHERE

          salary BETWEEN 100 AND 500;
Range Operator
     BETWEEN ~ AND ~
          SELECT * from Employee WHERE

          salary BETWEEN 100 AND 500;


          SELECT * from Employee WHERE

          salary >= 100 AND salary <=500;
Range Operator
     BETWEEN ~ AND ~
          SELECT * from Employee WHERE

          salary BETWEEN 100 AND 500;


          SELECT * from Employee WHERE

          salary >= 100 AND salary <=500;
          SELECT * from Employee WHERE

          BETWEEN(salary,100,500);
IN / EXISTS Clause
IN / EXISTS Clause
     IN / EXISTS SubQuery
IN / EXISTS Clause
     IN / EXISTS SubQuery
          SELECT * from Employee e WHERE e.DeptNo

          IN(SELECT d.DeptNo FROM Dept d)
IN / EXISTS Clause
     IN / EXISTS SubQuery
          SELECT * from Employee e WHERE e.DeptNo

          IN(SELECT d.DeptNo FROM Dept d)
          SELECT * from Employee e WHERE

          EXISTS(SELECT                                                 )
                                    1 FROM Dept d WHERE e.DeptNo=d.DeptNo
IN / EXISTS Clause
     IN / EXISTS SubQuery
          SELECT * from Employee e WHERE e.DeptNo

          IN(SELECT d.DeptNo FROM Dept d)
          SELECT * from Employee e WHERE

          EXISTS(SELECT              1 FROM Dept d WHERE e.DeptNo=d.DeptNo )


          SELECT * from Employee e

          LEFT SEMI JOIN                     Dept d   ON   (e.DeptNo=d.DeptNo)
NOT IN Clause
NOT IN Clause
      NOT IN SubQuery
NOT IN Clause
      NOT IN SubQuery
          SELECT * from Employee e WHERE e.DeptNo

          NOT IN(SELECT               d.DeptNo FROM Dept d)
NOT IN Clause
      NOT IN SubQuery
          SELECT * from Employee e WHERE e.DeptNo

          NOT IN(SELECT                d.DeptNo FROM Dept d)




          SELECT e.* from Employee e

          LEFT OUTER JOIN Dept d ON                    (e.DeptNo=d.DeptNo)

          WHERE d.DeptNo IS NULL
NOT EXIST Clause
NOT EXIST Clause
     NOT EXIST SubQuery
NOT EXIST Clause
     NOT EXIST SubQuery
          SELECT * from Employee e WHERE

          NOT EXISTS(SELECT            1 FROM Dept d WHERE e.DeptNo=d.DeptNo   )
NOT EXIST Clause
     NOT EXIST SubQuery
          SELECT * from Employee e WHERE

          NOT EXISTS(SELECT            1 FROM Dept d WHERE e.DeptNo=d.DeptNo   )



          SELECT e.* from Employee e

          LEFT OUTER JOIN Dept d ON                       (e.DeptNo=d.DeptNo)

          WHERE d.DeptNo IS NULL
LIKE Clause
LIKE Clause
     LIKE / NOT LIKE
LIKE Clause
     LIKE / NOT LIKE

          SELECT * from Employee e WHERE name   LIKE   ’%steve’
LIKE Clause
     LIKE / NOT LIKE

          SELECT * from Employee e WHERE name   LIKE   ’%steve’




          SELECT e.* from Employee e WHERE name   LIKE   ‘%steve’
LIKE Clause
     LIKE / NOT LIKE

          SELECT * from Employee e WHERE name   LIKE   ’%steve’


          SELECT * from Employee e WHERE name   NOT LIKE      ’%steve’




          SELECT e.* from Employee e WHERE name   LIKE   ‘%steve’
LIKE Clause
     LIKE / NOT LIKE

          SELECT * from Employee e WHERE name   LIKE      ’%steve’


          SELECT * from Employee e WHERE name   NOT LIKE         ’%steve’




          SELECT e.* from Employee e WHERE name   LIKE     ‘%steve’


          SELECT e.* from Employee e WHERE   NOT   name   LIKE       ‘%steve’
LIKE Clause
     LIKE / NOT LIKE

          SELECT * from Employee e WHERE name   LIKE      ’%steve’


          SELECT * from Employee e WHERE name   NOT LIKE         ’%steve’




          SELECT e.* from Employee e WHERE name   LIKE     ‘%steve’


          SELECT e.* from Employee e WHERE   NOT   name   LIKE       ‘%steve’
JOIN Operator (1/4)
JOIN Operator (1/4)
          SELF JOIN
JOIN Operator (1/4)
          SELF JOIN
          SELECT *

          FROM       Employee e1, Employee e2   WHERE   e1.ID = e2.Id
JOIN Operator (1/4)
          SELF JOIN
          SELECT *

          FROM       Employee e1, Employee e2   WHERE   e1.ID = e2.Id




          SELECT *

          FROM Employee e1 JOIN        Employee e2   ON (e1.ID   = e2.Id   )
JOIN Operator (2/4)
JOIN Operator (2/4)
      CROSS JOIN (Cartesian Product)
JOIN Operator (2/4)
      CROSS JOIN (Cartesian Product)

          SELECT emp.Name, dept.Name   FROM   Employee emp, Dept dep
JOIN Operator (2/4)
      CROSS JOIN (Cartesian Product)

          SELECT emp.Name, dept.Name   FROM   Employee emp, Dept dep




          SELECT emp.Name, dept.Name   FROM Employee emp JOIN   Dept dep
JOIN Operator (3/4)
JOIN Operator (3/4)
      LEFT OUTER JOIN
JOIN Operator (3/4)
      LEFT OUTER JOIN

              FROM Emp, Dept
          SELECT *

          WHERE Emp.deptNo = Dept.deptNo(+)
JOIN Operator (3/4)
      LEFT OUTER JOIN

              FROM Emp, Dept
          SELECT *

          WHERE Emp.deptNo = Dept.deptNo(+)




              FROM Emp
          SELECT *

          LEFT OUTER JOIN         Dept   ON   Emp.deptNO = Dept.deptNo
JOIN Operator (4/4)
JOIN Operator (4/4)
      RIGHT OUTER JOIN
JOIN Operator (4/4)
      RIGHT OUTER JOIN

              FROM Emp, Dept
          SELECT *

          WHERE Emp.deptNo(+) =   Dept.deptNo
JOIN Operator (4/4)
      RIGHT OUTER JOIN

              FROM Emp, Dept
          SELECT *

          WHERE Emp.deptNo(+) =   Dept.deptNo




              FROM Emp
          SELECT *

          RIGHT OUTER JOIN        Dept   ON   Emp.deptNO = Dept.deptNo
Oracle Function
Condition Function
Condition Function
      CASE
Condition Function
      CASE
          CASE    expr   WHEN         THEN r1
                                  cond1

          [WHEN cond2 THEN r2]*   [ELSE r] END
Condition Function
      CASE
          CASE    expr   WHEN         THEN r1
                                  cond1

          [WHEN cond2 THEN r2]*   [ELSE r] END




          CASE    expr   WHEN         THEN r1
                                  cond1

          [WHEN cond2 THEN r2]*   [ELSE r] END
Math Function
Math Function


               ROUND
Math Function


               ROUND          ROUND
Math Function


               ROUND              ROUND
                CEIL
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
               MOD
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
               MOD                PMOD
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
               MOD                PMOD
               POWER
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
               MOD                PMOD
               POWER           POW/POWER
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
               MOD                PMOD
               POWER           POW/POWER
               SQRT
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
               MOD                PMOD
               POWER           POW/POWER
               SQRT                SQRT
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
                MOD               PMOD
               POWER           POW/POWER
                SQRT               SQRT
               SIN/COS
Math Function


               ROUND              ROUND
                CEIL           CEIL/CEILING
                MOD               PMOD
               POWER           POW/POWER
                SQRT               SQRT
               SIN/COS           SIN/COS
Character Function
Character Function


               SUBSTR
Character Function


               SUBSTR          SUBSTR
Character Function


               SUBSTR          SUBSTR
                TRIM
Character Function


               SUBSTR          SUBSTR
                TRIM            TRIM
Character Function


               SUBSTR          SUBSTR
                TRIM            TRIM
          LPAD/RPAD
Character Function


               SUBSTR          SUBSTR
                TRIM            TRIM
          LPAD/RPAD          LPAD/RPAD
Character Function


               SUBSTR          SUBSTR
                TRIM            TRIM
          LPAD/RPAD          LPAD/RPAD
          LTRIM/RTRIM
Character Function


               SUBSTR          SUBSTR
                TRIM            TRIM
          LPAD/RPAD          LPAD/RPAD
          LTRIM/RTRIM        LTRIM/RTRIM
Character Function


               SUBSTR          SUBSTR
                TRIM            TRIM
          LPAD/RPAD          LPAD/RPAD
          LTRIM/RTRIM        LTRIM/RTRIM
               REPLACE
Character Function


               SUBSTR          SUBSTR
                TRIM            TRIM
          LPAD/RPAD          LPAD/RPAD
          LTRIM/RTRIM        LTRIM/RTRIM
               REPLACE     REGEXP_REPLACE
NULL Function
NULL Function


          COALESCE
NULL Function


          COALESCE       COALESCE
NULL Function


          COALESCE            COALESCE

               NVL
NULL Function


          COALESCE            COALESCE

               NVL           Custom UDF
NULL Function


          COALESCE            COALESCE

               NVL           Custom UDF

               NVL2
NULL Function


          COALESCE            COALESCE

               NVL           Custom UDF

               NVL2          Custom UDF
Custom UDF Function
               • Condition Function
                • DECODE
               • Null Comparison Function
                • NVL / NVL2
               • Type Conversion
                • TO_NUMBER
                • TO_CHAR
                • TO_DATE
Oracle Analytic
                  Function
Analytic Function
Analytic Function

          Joins, WHERE, GROUP BY clauses are performed
Analytic Function

          Joins, WHERE, GROUP BY clauses are performed


               the analytic functions are performed
                        with the result set
Analytic Function

          Joins, WHERE, GROUP BY clauses are performed


               the analytic functions are performed
                        with the result set


                 ORDER BY clause is processed
Analytic Function
                 Rank salary in dept

                 name	

 dept	

 	

       salary
                 ---------------------
                 a	

      Research	

     100
                 b	

      Research	

     100
                 c	

      Sales	

 	

    200
                 d	

      Sales	

 	

    300
                 e	

      Research	

     50
                 f	

      Accounting	

   200
                 g	

      Accounting	

   300
                 h	

      Accounting	

   400
                 i	

      Research	

     10
Analytic Function



name	

 dept	

 	

       salary
---------------------
a	

      Research	

     100
b	

      Research	

     100
c	

      Sales	

 	

    200
d	

      Sales	

 	

    300
e	

      Research	

     50
f	

      Accounting	

   200
g	

      Accounting	

   300
h	

      Accounting	

   400
i	

      Research	

     10
Analytic Function


                                   Map
name	

 dept	

 	

       salary
---------------------
a	

      Research	

     100
b	

      Research	

     100
c	

      Sales	

 	

    200
d	

      Sales	

 	

    300
e	

      Research	

     50




                                   Map
f	

      Accounting	

   200
g	

      Accounting	

   300
h	

      Accounting	

   400
i	

      Research	

     10




                                   Map
Analytic Function

a	

             Research	

     100
b	

             Research	

     100
c	

             Sales	

 	

     Map
                                 200



d	

             Sales	

 	

    300
e	

             Research	

      Map
                                 50
f	

             Accounting	

   200



g	

             Accounting	

   300
h	

             Accounting	

    Map
                                 400
i	

             Research	

     10
Analytic Function
                                       DISTRIBUTED BY dept
a	

             Research	

     100
b	

             Research	

     100
c	

             Sales	

 	

     Map
                                 200



d	

             Sales	

 	

    300
e	

             Research	

      Map
                                 50
f	

             Accounting	

   200



g	

             Accounting	

   300
h	

             Accounting	

    Map
                                 400
i	

             Research	

     10
Analytic Function
                                       DISTRIBUTED BY dept
a	

             Research	

     100
b	

             Research	

     100
c	

             Sales	

 	

     Map
                                 200
                                                             Reduce
d	

             Sales	

 	

    300
e	

             Research	

      Map
                                 50
f	

             Accounting	

   200
                                                             Reduce
g	

             Accounting	

   300
h	

             Accounting	

    Map
                                 400
i	

             Research	

     10
Analytic Function
                DISTRIBUTED BY dept

                     c      Sales           200
               Map   g      Accounting	

   300
                     h
                     d
                            Accounting	

                            Sales
                                            400
                                            300
                                                  Reduce
                     f      Accounting	

   200

               Map   g	

    Research	

    300
                     h	

    Research	

    400
                     e      Research        300   Reduce
                     i      Research         10

               Map
Analytic Function
                               SORT BY dept, salary

                c      Sales            200
          Map   d      Sales            300
                f      Accounting	

    200
                g      Accounting	

    300   Reduce
                h       Accounting	

   400


          Map   i      Research          10
                g	

    Research	

     300
                e      Research         300   Reduce
                h	

    Research	

     400

          Map
Analytic Function

                c      Sales            200
          Map   d      Sales            300
                f      Accounting	

    200
                g      Accounting	

    300   Reduce
                h       Accounting	

   400


          Map   i      Research          10
                g	

    Research	

     300
                e      Research         300   Reduce
                h	

    Research	

     400

          Map
Analytic Function
                 RANK(dept,salary)

                                  c      Sales            200   1
          Map                     d      Sales            300   2
                                  f      Accounting	

    200   1
                         Reduce   g      Accounting	

    300   2
                                  h       Accounting	

   400   3


          Map                     i       Research         10   1
                                  g	

     Research	

    300   2
                         Reduce   e       Research        300   3
                                  h	

     Research	

    400   4

          Map
Analytic Function
Analytic Function
RANK
Analytic Function
RANK
SELECT name,dept,salary,RANK()   OVER (PARTITION BY   dept
ORDER BY       salary   DESC) FROM   emp
Analytic Function
RANK
SELECT name,dept,salary,RANK()   OVER (PARTITION BY     dept
ORDER BY       salary   DESC) FROM      emp




SELECT e.name,e.dept,e.salary,RANK(    e.dept,e.salary)
FROM (SELECT name,      dept, salary   FROM   empDISTRIBUTED
BY dept SORT BY         dept, salary           DESC) e
Analytic Function
RANK
SELECT name,dept,salary,RANK()   OVER (PARTITION BY     dept
ORDER BY       salary   DESC) FROM      emp




RANK(arg1,arg2) - Custom UDF
SELECT e.name,e.dept,e.salary,RANK(    e.dept,e.salary)
FROM (SELECT name,      dept, salary   FROM   empDISTRIBUTED
BY dept SORT BY         dept, salary           DESC) e
Hive Optimization
                & Future Work
Tuning Parameter
Tuning Parameter

          • Hadoop Tunning
Tuning Parameter

          • Hadoop Tunning
               •   mapred.job.reuse.jvm.num.task
Tuning Parameter

          • Hadoop Tunning
               •   mapred.job.reuse.jvm.num.task

               •   mapred.child.java.opts
Tuning Parameter

          • Hadoop Tunning
               •   mapred.job.reuse.jvm.num.task

               •   mapred.child.java.opts

               •   mapred.min.split.size / mapred.max.split.size
Tuning Parameter

          • Hadoop Tunning
               •   mapred.job.reuse.jvm.num.task

               •   mapred.child.java.opts

               •   mapred.min.split.size / mapred.max.split.size

               •   dfs.block.size
Tuning Parameter

          • Hadoop Tunning
               •   mapred.job.reuse.jvm.num.task

               •   mapred.child.java.opts

               •   mapred.min.split.size / mapred.max.split.size

               •   dfs.block.size

          • Hive Tunning
Tuning Parameter

          • Hadoop Tunning
               •   mapred.job.reuse.jvm.num.task

               •   mapred.child.java.opts

               •   mapred.min.split.size / mapred.max.split.size

               •   dfs.block.size

          • Hive Tunning
               •   hive.input.format = CombineHiveInputFormat
UDF/UDAF


          • Develop UDF to optimize number of MR jobs
          • Extend GenericUDF to avoid java reflection
          • Avoid creating new objects in UDF
Future Work
Future Work
      • HiveQL SQL Compliance
               •   HIVE-282 - IN statement for WHERE clauses

               •   HIVE-192 - Add TIMESTAMP column type

               •   HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types
Future Work
      • HiveQL SQL Compliance
               •   HIVE-282 - IN statement for WHERE clauses

               •   HIVE-192 - Add TIMESTAMP column type

               •   HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types


      • Analytic Function
               •   HIVE-896 - Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive

               •   HIVE-952 - Support analytic NTILE function
Future Work
      • HiveQL SQL Compliance
               •   HIVE-282 - IN statement for WHERE clauses

               •   HIVE-192 - Add TIMESTAMP column type

               •   HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types


      • Analytic Function
               •   HIVE-896 - Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive

               •   HIVE-952 - Support analytic NTILE function


      • Optimization
               •   HIVE-1694 - Accelerate GROUP BY execution using indexes

               •   HIVE-482 - Optimize Group By + Order By with the same keys
Hive



          Oracle 2 Hive
Hive
               A system for managing and querying
               structured data built on top of Hadoop


          Oracle 2 Hive
Hive
               A system for managing and querying
               structured data built on top of Hadoop


          Oracle 2 Hive
               data model
               ANSI-SQL
               built-in function / custom UDF
               analytic function
Question ?

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Transilience map & analysis
Transilience map & analysisTransilience map & analysis
Transilience map & analysis
 
Types of Maps and other Interpretation
Types of  Maps and other Interpretation Types of  Maps and other Interpretation
Types of Maps and other Interpretation
 
Earth system science notes
Earth system science notesEarth system science notes
Earth system science notes
 
The risk reward diagrams
The risk reward diagramsThe risk reward diagrams
The risk reward diagrams
 
Chapter 3 gateway 123 combined
Chapter 3 gateway 123 combinedChapter 3 gateway 123 combined
Chapter 3 gateway 123 combined
 
Basics of Map reading
Basics of Map reading Basics of Map reading
Basics of Map reading
 
Concept Maps: Types, uses, software
Concept Maps: Types, uses, softwareConcept Maps: Types, uses, software
Concept Maps: Types, uses, software
 
The risk reward diagram
The risk reward diagramThe risk reward diagram
The risk reward diagram
 
You Got Skills #3: Types of Maps
You Got Skills #3: Types of MapsYou Got Skills #3: Types of Maps
You Got Skills #3: Types of Maps
 
Topographic map with animation
Topographic map with animationTopographic map with animation
Topographic map with animation
 
Map Reading
Map ReadingMap Reading
Map Reading
 
Types of Maps
Types of MapsTypes of Maps
Types of Maps
 
Reading a topographic map
Reading a topographic mapReading a topographic map
Reading a topographic map
 
Secondary 1 Geography-Topographic Map
Secondary 1 Geography-Topographic MapSecondary 1 Geography-Topographic Map
Secondary 1 Geography-Topographic Map
 
Class X ICSE Geography Solved Question Paper
Class X ICSE Geography Solved Question PaperClass X ICSE Geography Solved Question Paper
Class X ICSE Geography Solved Question Paper
 
Map Essentials (TODALSIGS)
Map Essentials (TODALSIGS)Map Essentials (TODALSIGS)
Map Essentials (TODALSIGS)
 
Topography powerpoint
Topography powerpointTopography powerpoint
Topography powerpoint
 
LS GE Slides - Map Reading
LS GE Slides - Map ReadingLS GE Slides - Map Reading
LS GE Slides - Map Reading
 
Map & a diagram (1)
Map & a diagram (1)Map & a diagram (1)
Map & a diagram (1)
 
Geography: Different Kinds Of Maps
Geography: Different Kinds Of MapsGeography: Different Kinds Of Maps
Geography: Different Kinds Of Maps
 

Ähnlich wie SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Aspirus Epic Hyperspace VCE Proof of Concept
Aspirus Epic Hyperspace VCE Proof of ConceptAspirus Epic Hyperspace VCE Proof of Concept
Aspirus Epic Hyperspace VCE Proof of Concepttomwhalen
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerCodemotion
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - HadoopTalentica Software
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014rpbrehm
 
GIS Data Models Explained
GIS Data Models ExplainedGIS Data Models Explained
GIS Data Models ExplainedNicole Ceranek
 
Gnubila france value proposition v3
Gnubila france   value proposition v3Gnubila france   value proposition v3
Gnubila france value proposition v3David MANSET
 
Large-Scale Simulator for Global Data Infrastructure Optimization
Large-Scale Simulator for Global Data Infrastructure OptimizationLarge-Scale Simulator for Global Data Infrastructure Optimization
Large-Scale Simulator for Global Data Infrastructure Optimizationsergherrero
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationTravis Oliphant
 
Know thy logos
Know thy logosKnow thy logos
Know thy logosVishal V
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 

Ähnlich wie SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive (20)

Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Pratibha_Kakarla
Pratibha_KakarlaPratibha_Kakarla
Pratibha_Kakarla
 
Aspirus Epic Hyperspace VCE Proof of Concept
Aspirus Epic Hyperspace VCE Proof of ConceptAspirus Epic Hyperspace VCE Proof of Concept
Aspirus Epic Hyperspace VCE Proof of Concept
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Big Data Technologies - Hadoop
Big Data Technologies - HadoopBig Data Technologies - Hadoop
Big Data Technologies - Hadoop
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
 
GIS Data Models Explained
GIS Data Models ExplainedGIS Data Models Explained
GIS Data Models Explained
 
Gnubila france value proposition v3
Gnubila france   value proposition v3Gnubila france   value proposition v3
Gnubila france value proposition v3
 
Large-Scale Simulator for Global Data Infrastructure Optimization
Large-Scale Simulator for Global Data Infrastructure OptimizationLarge-Scale Simulator for Global Data Infrastructure Optimization
Large-Scale Simulator for Global Data Infrastructure Optimization
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
Standardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft PresentationStandardizing arrays -- Microsoft Presentation
Standardizing arrays -- Microsoft Presentation
 
Know thy logos
Know thy logosKnow thy logos
Know thy logos
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 

Mehr von Korea Sdec

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerKorea Sdec
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionKorea Sdec
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopKorea Sdec
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopKorea Sdec
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingKorea Sdec
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of PigKorea Sdec
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveKorea Sdec
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopKorea Sdec
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 RapidantKorea Sdec
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACCKorea Sdec
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesKorea Sdec
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudKorea Sdec
 

Mehr von Korea Sdec (16)

SDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuerSDEC2011 Big engineer vs small entreprenuer
SDEC2011 Big engineer vs small entreprenuer
 
SDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestionSDEC2011 Implementing me2day friend suggestion
SDEC2011 Implementing me2day friend suggestion
 
SDEC2011 Introducing Hadoop
SDEC2011 Introducing HadoopSDEC2011 Introducing Hadoop
SDEC2011 Introducing Hadoop
 
Sdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoopSdec2011 shashank-introducing hadoop
Sdec2011 shashank-introducing hadoop
 
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modellingSDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
 
SDEC2011 Essentials of Pig
SDEC2011 Essentials of PigSDEC2011 Essentials of Pig
SDEC2011 Essentials of Pig
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
SDEC2011 Essentials of Hive
SDEC2011 Essentials of HiveSDEC2011 Essentials of Hive
SDEC2011 Essentials of Hive
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Sdec2011 Introducing Hadoop
Sdec2011 Introducing HadoopSdec2011 Introducing Hadoop
Sdec2011 Introducing Hadoop
 
SDEC2011 Rapidant
SDEC2011 RapidantSDEC2011 Rapidant
SDEC2011 Rapidant
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
SDEC2011 Going by TACC
SDEC2011 Going by TACCSDEC2011 Going by TACC
SDEC2011 Going by TACC
 
SDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & ExperiencesSDEC2011 Glory-FS development & Experiences
SDEC2011 Glory-FS development & Experiences
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 
SDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloudSDEC2011 Arcus NHN memcached cloud
SDEC2011 Arcus NHN memcached cloud
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

SDEC2011 Replacing legacy Telco DB/DW to Hadoop and Hive

  • 1. SDEC 2011 Seoul Data Engineering Camp June 27-28 Seoul, South Korea
  • 2. Replacing Legacy Telco DB/DW to Hadoop and Hive JunHo Cho NexR
  • 4. Agenda • Motivation for Hive and Hadoop
  • 5. Agenda • Motivation for Hive and Hadoop • Hive Internal
  • 6. Agenda • Motivation for Hive and Hadoop • Hive Internal • Oracle Migration UseCase
  • 7. Agenda • Motivation for Hive and Hadoop • Hive Internal • Oracle Migration UseCase • Hive Optimization
  • 8. Agenda • Motivation for Hive and Hadoop • Hive Internal • Oracle Migration UseCase • Hive Optimization • Future Work
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. qu er Co n de & Di vi
  • 29. OpenSource Storage & Computing
  • 31. OpenSource Collection
  • 33. OpenSource Search
  • 35. OpenSource Analysis
  • 37. OpenSource Coordination
  • 39.
  • 40.
  • 41.
  • 43. What is HIVE ? • A system for managing and querying structured data built on top of Hadoop • Map-Reduce for execution • HDFS for storage • Metadata in an RDBMS
  • 44. What is HIVE ? • A system for managing and querying structured data built on top of Hadoop • Map-Reduce for execution • HDFS for storage • Metadata in an RDBMS • Key Building Principles • SQL is a familiar language • Extensibility - Types, Functions, Formats, Scripts • Performance
  • 46.
  • 48.
  • 49. public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); } }
  • 50. public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); er private Text word = new Text(); app public void map(WritableComparable key, Writable value, M OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); } }
  • 51. public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); er private Text word = new Text(); app public void map(WritableComparable key, Writable value, M OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); } } public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value } output.collect(key, new IntWritable(sum)); } }
  • 52. public class CallCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); er private Text word = new Text(); app public void map(WritableComparable key, Writable value, M OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); word.set(itr.nextToken()); output.collect(word, one); } } public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator values, er OutputCollector output, Reporter reporter) throws IOException { uc int sum = 0; ed while (values.hasNext()) { R IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value } output.collect(key, new IntWritable(sum)); } }
  • 53. public class CallCountMapper extends MapReduceBase public class CallCount { implements Mapper<LongWritable, Text, Text, IntWritable> { public static void main(String[] args) { private final IntWritable one = new IntWritable(1); JobClient client = new JobClient(); er private Text word = new Text(); JobConf conf = new JobConf(WordCount.class); app public void map(WritableComparable key, Writable value, // specify output types M OutputCollector output, Reporter reporter) throws IOException { conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); // specify input and output dirs word.set(itr.nextToken()); FileInputPath.addInputPath(conf, new Path("input")); output.collect(word, one); FileOutputPath.addOutputPath(conf, new Path("output")); } } // specify a mapper conf.setMapperClass(KeyCountMapper.class); // specify a reducer conf.setReducerClass(CallCountReducer.class); conf.setCombinerClass(CallCountReducer.class); public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { client.setConf(conf); try { public void reduce(Text key, Iterator values, JobClient.runJob(conf); er OutputCollector output, Reporter reporter) throws IOException { } catch (Exception e) { uc e.printStackTrace(); int sum = 0; } ed while (values.hasNext()) { } R IntWritable value = (IntWritable) values.next(); } sum += value.get(); // process value } output.collect(key, new IntWritable(sum)); } }
  • 54. public class CallCountMapper extends MapReduceBase public class CallCount { implements Mapper<LongWritable, Text, Text, IntWritable> { public static void main(String[] args) { private final IntWritable one = new IntWritable(1); JobClient client = new JobClient(); er private Text word = new Text(); JobConf conf = new JobConf(WordCount.class); app public void map(WritableComparable key, Writable value, // specify output types M OutputCollector output, Reporter reporter) throws IOException { conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); String line = value.toString(); er StringTokenizer itr = new StringTokenizer(line.toLowerCase()); // specify input and output dirs riv word.set(itr.nextToken()); FileInputPath.addInputPath(conf, new Path("input")); output.collect(word, one); FileOutputPath.addOutputPath(conf, new Path("output")); D } } // specify a mapper conf.setMapperClass(KeyCountMapper.class); // specify a reducer conf.setReducerClass(CallCountReducer.class); conf.setCombinerClass(CallCountReducer.class); public class CallCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { client.setConf(conf); try { public void reduce(Text key, Iterator values, JobClient.runJob(conf); er OutputCollector output, Reporter reporter) throws IOException { } catch (Exception e) { uc e.printStackTrace(); int sum = 0; } ed while (values.hasNext()) { } R IntWritable value = (IntWritable) values.next(); } sum += value.get(); // process value } output.collect(key, new IntWritable(sum)); } }
  • 55.
  • 56. SELECT pnum, count(pnum) FROM cdr GROUP BY pnum;
  • 57. History of Hive • Hive development cycle is fast and the developer community is growing rapidly • Product release cycle is accelerating Project started 0.3.0 0.4.0 0.5.0 0.6.0 0.7.0 0.7.1 03/08 4/09 12/09 02/10 10/10 03/11 06/11
  • 58. History of Hive • Hive development cycle is fast and the developer community is growing rapidly • Product release cycle is accelerating Project started 0.3.0 0.4.0 0.5.0 0.6.0 0.7.0 0.7.1 03/08 4/09 12/09 02/10 10/10 03/11 06/11
  • 59. History of Hive • Hive development cycle is fast and the developer community is growing rapidly • Product release cycle is accelerating Project started 0.3.0 0.4.0 0.5.0 0.6.0 0.7.0 0.7.1 03/08 4/09 12/09 02/10 10/10 03/11 06/11
  • 60. Who use Hive? http://wiki.apache.org/hadoop/Hive/PoweredBy
  • 62. UseCase in Hive? • Report and ad hoc query
  • 63. UseCase in Hive? • Report and ad hoc query • Log Analysis
  • 64. UseCase in Hive? • Report and ad hoc query • Log Analysis • Social Graph Analysis
  • 65. UseCase in Hive? • Report and ad hoc query • Log Analysis • Social Graph Analysis • Data mining and analysis
  • 66. UseCase in Hive? • Report and ad hoc query • Log Analysis • Social Graph Analysis • Data mining and analysis • Machine Learning
  • 67. UseCase in Hive? • Report and ad hoc query • Log Analysis • Social Graph Analysis • Data mining and analysis • Machine Learning • Dataset cleaning
  • 68. UseCase in Hive? • Report and ad hoc query • Log Analysis • Social Graph Analysis • Data mining and analysis • Machine Learning • Dataset cleaning • Data Warehouse
  • 69. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result
  • 70. Hive Architecture UI Driver select col1 from tab1 where ... DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result
  • 71. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result
  • 72. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result
  • 73. Hive Architecture UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result
  • 74. Hive Architecture a 123344 b 121211 c 342434 UI Driver DDL HQL Execution Works Engine MetaStore Compiler ORM Hadoop Result
  • 75. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 76. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 77. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5
  • 78. Parser Parser Select col1,col2 From tab1 Where col3 > 5 QB TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5
  • 79. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR QB tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5
  • 80. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE TOK_TABLE_OR_COL 5 QB insclause-0
  • 81. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL TOK_TMP_FILE col1 QB TOK_TABLE_OR_COL 5 insclause-0
  • 82. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 QB TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0
  • 83. Parser Parser Select col1,col2 From tab1 Where col3 > 5 TOK_QUERY TOK_FROM TOK_INSERT TOK_DESTINATION TOK_SELECT TOK_WHERE QB TOK_TABNAME TOK_SELEXPR TOK_SELEXPR tab1 TOK_DIR > TOK_TABLE_OR_COL TOK_TABLE_OR_COL col1 col2 TOK_TMP_FILE TOK_TABLE_OR_COL 5 insclause-0
  • 84. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 85. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 86. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB
  • 87. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TOK_WHERE TOK_SELECT TOK_DESTINATION
  • 88. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATION
  • 89. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE TOK_SELECT TOK_DESTINATION
  • 90. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATION
  • 91. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT TOK_DESTINATION
  • 92. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION
  • 93. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION
  • 94. Plan Plan Select col1,col2 From tab1 Where col3 > 5 QB TOK_FROM TableScanOperator TOK_WHERE FilterOperator TOK_SELECT SelectOperator TOK_DESTINATION FileSinkOperator
  • 95. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 96. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 97. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 TableScanOperator FilterOperator SelectOperator FileSinkOperator
  • 98. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperator
  • 99. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator SelectOperator FileSinkOperator
  • 100. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator ColumnPruner SelectOperator FileSinkOperator
  • 101. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} Context TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperator
  • 102. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator FIL ColumnPruner TS SEL SelectOperator FileSinkOperator Context
  • 103. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner SelectOperator FIL FileSinkOperator Context TS SEL
  • 104. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperator
  • 105. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL col1, col2 FileSinkOperator
  • 106. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FilterOperator ColumnPruner FIL SelectOperator Context TS SEL FileSinkOperator
  • 107. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL col1, col2, col3 FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperator
  • 108. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} TableScanOperator FIL FilterOperator Context TS ColumnPruner SEL SelectOperator FileSinkOperator
  • 109. Optimizer Optimizer Select col1,col2 From tab1 Where col3 > 5 tab1 {col1, col2, col3, col4,col5,col6,col7} FIL TableScanOperator Context TS col1, col2, col3 SEL FilterOperator ColumnPruner FilterOperator SelectOperator FileSinkOperator
  • 110. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 111. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF/UDAF SELOperator substr sum MetaStore Hive QL FSOperator average Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 112. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB
  • 113. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB FetchTask
  • 114. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 115. Task Task Select col1,col2 From tab1 Where col3 > 5 TS - GenMRTableScan1 TaskFactory FS - GenMRFileSink1 QB TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 116. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 117. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 118. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 119. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 120. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory FS - GenMRFileSink1 QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 121. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 122. Task Task Select col1,col2 From tab1 Where col3 > 5 TaskFactory QB MapRedTask MapRedTask TableScanOperator FilterOperator FetchTask FilterOperator SelectOperator FileSinkOperator
  • 123. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 124. Hive Internal Map Reduce Web UI Hive CLI JDBC TSOperator User Script Browse, Query, DDL UDF FILOperator SELOperator MetaStore Hive QL FILOperator FSOperator Thrift API Parser ExecMapper/ExecReducer Plan SerDe Optimizer Input/OutputFormat Task HDFS StorageHandler RCFile DB ... HBase
  • 125. Oracle Migration to Hive
  • 126. l l l l l
  • 127. l l l l l l l l l l
  • 128. l l l l l l l l l l
  • 130. Data Model Hive Entity Sample HDFS LOC
  • 131. Data Model Hive Entity Sample HDFS LOC Table
  • 132. Data Model Hive Entity Sample HDFS LOC Table Log /hive/Log
  • 133. Data Model Hive Entity Sample HDFS LOC Table Log /hive/Log Partition
  • 134. Data Model Hive Entity Sample HDFS LOC Table Log /hive/Log Partition time=hour /hive/Log/time=1h
  • 135. Data Model Hive Entity Sample HDFS LOC Table Log /hive/Log Partition time=hour /hive/Log/time=1h Bucket
  • 136. Data Model Hive Entity Sample HDFS LOC Table Log /hive/Log Partition time=hour /hive/Log/time=1h /wh/Log/time=1h/ Bucket phone-num part-$hash(phone-num)
  • 137. Data Model Hive Entity Sample HDFS LOC Table Log /hive/Log Partition time=hour /hive/Log/time=1h /wh/Log/time=1h/ Bucket phone-num part-$hash(phone-num) External Table
  • 138. Data Model Hive Entity Sample HDFS LOC Table Log /hive/Log Partition time=hour /hive/Log/time=1h /wh/Log/time=1h/ Bucket phone-num part-$hash(phone-num) External /app/meta/dir customer (arbitrary location) Table
  • 139. Data Model MetaStore HDFS Table Data Location Partition Bucketing Info Partitioning Info part-001 Bucket Partition MetaStore DB /hive/Log /hive/Log/time=1h /hive/Log/time=1h/part-0001
  • 141. Column Data Types • Primitive Types • int type : tinyint, smallint, int, bigint • boolean, float, double, string
  • 142. Column Data Types • Primitive Types • int type : tinyint, smallint, int, bigint • boolean, float, double, string • Nest-able Collections • array : value(any-type) • map : key(primitive) and value(any-type)
  • 143. Column Data Types • Primitive Types • int type : tinyint, smallint, int, bigint • boolean, float, double, string • Nest-able Collections • array : value(any-type) • map : key(primitive) and value(any-type) • User-defined types • structures with attributes
  • 145. DataType Convert NUMBER(n)
  • 146. DataType Convert NUMBER(n) TINYINT INT/BIGINT
  • 147. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m)
  • 148. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE
  • 149. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2
  • 150. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING
  • 151. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATE
  • 152. DataType Convert NUMBER(n) TINYINT INT/BIGINT NUMBER(n,m) FLOAT/DOUBLE VARCHAR2 STRING DATE STRING “yyyy-MM-dd HH:mm:ss” format
  • 153. Oracle DML • HIVE supports ANSI-SQL • Sub-Queries in FROM clause • Join query : equi-join/inner-join , outer-join
  • 155. Range Operator BETWEEN ~ AND ~
  • 156. Range Operator BETWEEN ~ AND ~ SELECT * from Employee WHERE salary BETWEEN 100 AND 500;
  • 157. Range Operator BETWEEN ~ AND ~ SELECT * from Employee WHERE salary BETWEEN 100 AND 500; SELECT * from Employee WHERE salary >= 100 AND salary <=500;
  • 158. Range Operator BETWEEN ~ AND ~ SELECT * from Employee WHERE salary BETWEEN 100 AND 500; SELECT * from Employee WHERE salary >= 100 AND salary <=500; SELECT * from Employee WHERE BETWEEN(salary,100,500);
  • 159. IN / EXISTS Clause
  • 160. IN / EXISTS Clause IN / EXISTS SubQuery
  • 161. IN / EXISTS Clause IN / EXISTS SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d)
  • 162. IN / EXISTS Clause IN / EXISTS SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d) SELECT * from Employee e WHERE EXISTS(SELECT ) 1 FROM Dept d WHERE e.DeptNo=d.DeptNo
  • 163. IN / EXISTS Clause IN / EXISTS SubQuery SELECT * from Employee e WHERE e.DeptNo IN(SELECT d.DeptNo FROM Dept d) SELECT * from Employee e WHERE EXISTS(SELECT 1 FROM Dept d WHERE e.DeptNo=d.DeptNo ) SELECT * from Employee e LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo)
  • 165. NOT IN Clause NOT IN SubQuery
  • 166. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d)
  • 167. NOT IN Clause NOT IN SubQuery SELECT * from Employee e WHERE e.DeptNo NOT IN(SELECT d.DeptNo FROM Dept d) SELECT e.* from Employee e LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo) WHERE d.DeptNo IS NULL
  • 169. NOT EXIST Clause NOT EXIST SubQuery
  • 170. NOT EXIST Clause NOT EXIST SubQuery SELECT * from Employee e WHERE NOT EXISTS(SELECT 1 FROM Dept d WHERE e.DeptNo=d.DeptNo )
  • 171. NOT EXIST Clause NOT EXIST SubQuery SELECT * from Employee e WHERE NOT EXISTS(SELECT 1 FROM Dept d WHERE e.DeptNo=d.DeptNo ) SELECT e.* from Employee e LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo) WHERE d.DeptNo IS NULL
  • 173. LIKE Clause LIKE / NOT LIKE
  • 174. LIKE Clause LIKE / NOT LIKE SELECT * from Employee e WHERE name LIKE ’%steve’
  • 175. LIKE Clause LIKE / NOT LIKE SELECT * from Employee e WHERE name LIKE ’%steve’ SELECT e.* from Employee e WHERE name LIKE ‘%steve’
  • 176. LIKE Clause LIKE / NOT LIKE SELECT * from Employee e WHERE name LIKE ’%steve’ SELECT * from Employee e WHERE name NOT LIKE ’%steve’ SELECT e.* from Employee e WHERE name LIKE ‘%steve’
  • 177. LIKE Clause LIKE / NOT LIKE SELECT * from Employee e WHERE name LIKE ’%steve’ SELECT * from Employee e WHERE name NOT LIKE ’%steve’ SELECT e.* from Employee e WHERE name LIKE ‘%steve’ SELECT e.* from Employee e WHERE NOT name LIKE ‘%steve’
  • 178. LIKE Clause LIKE / NOT LIKE SELECT * from Employee e WHERE name LIKE ’%steve’ SELECT * from Employee e WHERE name NOT LIKE ’%steve’ SELECT e.* from Employee e WHERE name LIKE ‘%steve’ SELECT e.* from Employee e WHERE NOT name LIKE ‘%steve’
  • 180. JOIN Operator (1/4) SELF JOIN
  • 181. JOIN Operator (1/4) SELF JOIN SELECT * FROM Employee e1, Employee e2 WHERE e1.ID = e2.Id
  • 182. JOIN Operator (1/4) SELF JOIN SELECT * FROM Employee e1, Employee e2 WHERE e1.ID = e2.Id SELECT * FROM Employee e1 JOIN Employee e2 ON (e1.ID = e2.Id )
  • 184. JOIN Operator (2/4) CROSS JOIN (Cartesian Product)
  • 185. JOIN Operator (2/4) CROSS JOIN (Cartesian Product) SELECT emp.Name, dept.Name FROM Employee emp, Dept dep
  • 186. JOIN Operator (2/4) CROSS JOIN (Cartesian Product) SELECT emp.Name, dept.Name FROM Employee emp, Dept dep SELECT emp.Name, dept.Name FROM Employee emp JOIN Dept dep
  • 188. JOIN Operator (3/4) LEFT OUTER JOIN
  • 189. JOIN Operator (3/4) LEFT OUTER JOIN FROM Emp, Dept SELECT * WHERE Emp.deptNo = Dept.deptNo(+)
  • 190. JOIN Operator (3/4) LEFT OUTER JOIN FROM Emp, Dept SELECT * WHERE Emp.deptNo = Dept.deptNo(+) FROM Emp SELECT * LEFT OUTER JOIN Dept ON Emp.deptNO = Dept.deptNo
  • 192. JOIN Operator (4/4) RIGHT OUTER JOIN
  • 193. JOIN Operator (4/4) RIGHT OUTER JOIN FROM Emp, Dept SELECT * WHERE Emp.deptNo(+) = Dept.deptNo
  • 194. JOIN Operator (4/4) RIGHT OUTER JOIN FROM Emp, Dept SELECT * WHERE Emp.deptNo(+) = Dept.deptNo FROM Emp SELECT * RIGHT OUTER JOIN Dept ON Emp.deptNO = Dept.deptNo
  • 198. Condition Function CASE CASE expr WHEN THEN r1 cond1 [WHEN cond2 THEN r2]* [ELSE r] END
  • 199. Condition Function CASE CASE expr WHEN THEN r1 cond1 [WHEN cond2 THEN r2]* [ELSE r] END CASE expr WHEN THEN r1 cond1 [WHEN cond2 THEN r2]* [ELSE r] END
  • 201. Math Function ROUND
  • 202. Math Function ROUND ROUND
  • 203. Math Function ROUND ROUND CEIL
  • 204. Math Function ROUND ROUND CEIL CEIL/CEILING
  • 205. Math Function ROUND ROUND CEIL CEIL/CEILING MOD
  • 206. Math Function ROUND ROUND CEIL CEIL/CEILING MOD PMOD
  • 207. Math Function ROUND ROUND CEIL CEIL/CEILING MOD PMOD POWER
  • 208. Math Function ROUND ROUND CEIL CEIL/CEILING MOD PMOD POWER POW/POWER
  • 209. Math Function ROUND ROUND CEIL CEIL/CEILING MOD PMOD POWER POW/POWER SQRT
  • 210. Math Function ROUND ROUND CEIL CEIL/CEILING MOD PMOD POWER POW/POWER SQRT SQRT
  • 211. Math Function ROUND ROUND CEIL CEIL/CEILING MOD PMOD POWER POW/POWER SQRT SQRT SIN/COS
  • 212. Math Function ROUND ROUND CEIL CEIL/CEILING MOD PMOD POWER POW/POWER SQRT SQRT SIN/COS SIN/COS
  • 215. Character Function SUBSTR SUBSTR
  • 216. Character Function SUBSTR SUBSTR TRIM
  • 217. Character Function SUBSTR SUBSTR TRIM TRIM
  • 218. Character Function SUBSTR SUBSTR TRIM TRIM LPAD/RPAD
  • 219. Character Function SUBSTR SUBSTR TRIM TRIM LPAD/RPAD LPAD/RPAD
  • 220. Character Function SUBSTR SUBSTR TRIM TRIM LPAD/RPAD LPAD/RPAD LTRIM/RTRIM
  • 221. Character Function SUBSTR SUBSTR TRIM TRIM LPAD/RPAD LPAD/RPAD LTRIM/RTRIM LTRIM/RTRIM
  • 222. Character Function SUBSTR SUBSTR TRIM TRIM LPAD/RPAD LPAD/RPAD LTRIM/RTRIM LTRIM/RTRIM REPLACE
  • 223. Character Function SUBSTR SUBSTR TRIM TRIM LPAD/RPAD LPAD/RPAD LTRIM/RTRIM LTRIM/RTRIM REPLACE REGEXP_REPLACE
  • 225. NULL Function COALESCE
  • 226. NULL Function COALESCE COALESCE
  • 227. NULL Function COALESCE COALESCE NVL
  • 228. NULL Function COALESCE COALESCE NVL Custom UDF
  • 229. NULL Function COALESCE COALESCE NVL Custom UDF NVL2
  • 230. NULL Function COALESCE COALESCE NVL Custom UDF NVL2 Custom UDF
  • 231. Custom UDF Function • Condition Function • DECODE • Null Comparison Function • NVL / NVL2 • Type Conversion • TO_NUMBER • TO_CHAR • TO_DATE
  • 232. Oracle Analytic Function
  • 234. Analytic Function Joins, WHERE, GROUP BY clauses are performed
  • 235. Analytic Function Joins, WHERE, GROUP BY clauses are performed the analytic functions are performed with the result set
  • 236. Analytic Function Joins, WHERE, GROUP BY clauses are performed the analytic functions are performed with the result set ORDER BY clause is processed
  • 237. Analytic Function Rank salary in dept name dept salary --------------------- a Research 100 b Research 100 c Sales 200 d Sales 300 e Research 50 f Accounting 200 g Accounting 300 h Accounting 400 i Research 10
  • 238. Analytic Function name dept salary --------------------- a Research 100 b Research 100 c Sales 200 d Sales 300 e Research 50 f Accounting 200 g Accounting 300 h Accounting 400 i Research 10
  • 239. Analytic Function Map name dept salary --------------------- a Research 100 b Research 100 c Sales 200 d Sales 300 e Research 50 Map f Accounting 200 g Accounting 300 h Accounting 400 i Research 10 Map
  • 240. Analytic Function a Research 100 b Research 100 c Sales Map 200 d Sales 300 e Research Map 50 f Accounting 200 g Accounting 300 h Accounting Map 400 i Research 10
  • 241. Analytic Function DISTRIBUTED BY dept a Research 100 b Research 100 c Sales Map 200 d Sales 300 e Research Map 50 f Accounting 200 g Accounting 300 h Accounting Map 400 i Research 10
  • 242. Analytic Function DISTRIBUTED BY dept a Research 100 b Research 100 c Sales Map 200 Reduce d Sales 300 e Research Map 50 f Accounting 200 Reduce g Accounting 300 h Accounting Map 400 i Research 10
  • 243. Analytic Function DISTRIBUTED BY dept c Sales 200 Map g Accounting 300 h d Accounting Sales 400 300 Reduce f Accounting 200 Map g Research 300 h Research 400 e Research 300 Reduce i Research 10 Map
  • 244. Analytic Function SORT BY dept, salary c Sales 200 Map d Sales 300 f Accounting 200 g Accounting 300 Reduce h Accounting 400 Map i Research 10 g Research 300 e Research 300 Reduce h Research 400 Map
  • 245. Analytic Function c Sales 200 Map d Sales 300 f Accounting 200 g Accounting 300 Reduce h Accounting 400 Map i Research 10 g Research 300 e Research 300 Reduce h Research 400 Map
  • 246. Analytic Function RANK(dept,salary) c Sales 200 1 Map d Sales 300 2 f Accounting 200 1 Reduce g Accounting 300 2 h Accounting 400 3 Map i Research 10 1 g Research 300 2 Reduce e Research 300 3 h Research 400 4 Map
  • 249. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp
  • 250. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) e
  • 251. Analytic Function RANK SELECT name,dept,salary,RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM emp RANK(arg1,arg2) - Custom UDF SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary) FROM (SELECT name, dept, salary FROM empDISTRIBUTED BY dept SORT BY dept, salary DESC) e
  • 252. Hive Optimization & Future Work
  • 254. Tuning Parameter • Hadoop Tunning
  • 255. Tuning Parameter • Hadoop Tunning • mapred.job.reuse.jvm.num.task
  • 256. Tuning Parameter • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts
  • 257. Tuning Parameter • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size
  • 258. Tuning Parameter • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size
  • 259. Tuning Parameter • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning
  • 260. Tuning Parameter • Hadoop Tunning • mapred.job.reuse.jvm.num.task • mapred.child.java.opts • mapred.min.split.size / mapred.max.split.size • dfs.block.size • Hive Tunning • hive.input.format = CombineHiveInputFormat
  • 261. UDF/UDAF • Develop UDF to optimize number of MR jobs • Extend GenericUDF to avoid java reflection • Avoid creating new objects in UDF
  • 263. Future Work • HiveQL SQL Compliance • HIVE-282 - IN statement for WHERE clauses • HIVE-192 - Add TIMESTAMP column type • HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types
  • 264. Future Work • HiveQL SQL Compliance • HIVE-282 - IN statement for WHERE clauses • HIVE-192 - Add TIMESTAMP column type • HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types • Analytic Function • HIVE-896 - Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive • HIVE-952 - Support analytic NTILE function
  • 265. Future Work • HiveQL SQL Compliance • HIVE-282 - IN statement for WHERE clauses • HIVE-192 - Add TIMESTAMP column type • HIVE-1269 - Support Date/Datetime/Time/Timestamp Primitive Types • Analytic Function • HIVE-896 - Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive • HIVE-952 - Support analytic NTILE function • Optimization • HIVE-1694 - Accelerate GROUP BY execution using indexes • HIVE-482 - Optimize Group By + Order By with the same keys
  • 266.
  • 267.
  • 268. Hive Oracle 2 Hive
  • 269. Hive A system for managing and querying structured data built on top of Hadoop Oracle 2 Hive
  • 270. Hive A system for managing and querying structured data built on top of Hadoop Oracle 2 Hive data model ANSI-SQL built-in function / custom UDF analytic function
  • 271.
  • 272.