SlideShare ist ein Scribd-Unternehmen logo
1 von 7
AdvancedTopics in Hive
Agenda
 HiveQL: Views
 HiveQL: Indexes
 Partitions
 Bucketing
 Hive UDF’s
HiveQL:Views
 A view is a sort of “virtual table” that is defined by a SELECT statement.
 A view allows a query to be saved and treated like a table. It is a logical construct, as it does not
store data like a table
 Eg:
 FROM ( SELECT * FROM people JOIN cart
ON (cart.people_id=people.id) WHERE firstname='john'
) a SELECT a.lastname WHERE a.id=3;
Create View : -
CREATE VIEW shorter_join AS
SELECT * FROM people JOIN cart
ON (cart.people_id=people.id) WHERE firstname='john';
Final require query: SELECT lastname FROM shorter_join WHERE id=3;
HiveQL: Indexes
 The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table.
 Creating an Index:
 CREATE INDEX employees_index
ON TABLE employees (country)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
WITH DEFERRED REBUILD
IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time')
IN TABLE employees_index_table
PARTITIONED BY (country, name)
COMMENT 'Employees indexed by country and name.';
Rebuilding the Index:
 ALTER INDEX employees_index ON TABLE employees PARTITION (country = 'US') REBUILD;
Partitions
 Hive organizes tables into partitions, a way of dividing a table into coarse-grained parts based on the value of a
partition column, such as a date, country, state.. Etc.
 Imagine logfiles where each record includes a timestamp. If we partitioned by date, then records for the same
date would be stored in the same partition.
 A table may be partitioned in multiple dimensions
 CREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt STRING, country STRING);
 Load data into a partitioned table:
 LOAD DATA LOCAL INPATH 'input/hive/partitions/file1‘ INTO TABLE logs PARTITION (dt='2001-01-01', country='GB');
 SHOW PARTITIONS logs;
Buckets
 Partitions offer a convenient way to segregate data and to optimize queries. However, not all data sets lead to
sensible partitioning.
 Bucketing is another technique for decomposing data sets into more manageable parts.
 For example, suppose a table using the date dt as the top-level partition and the user_id as the second-level
partition leads to too many small partitions
 CREATE TABLE weblog (user_id INT, url STRING, source_ip STRING)
PARTITIONED BY (dt STRING)
CLUSTERED BY (user_id) INTO 10 BUCKETS;
Sorted bucketing:
CREATE TABLE bucketed_users (id INT, name STRING)
CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS;
Inserting data into bucket table:
INSERT OVERWRITE TABLE bucketed_users
SELECT * FROM users;
Hive UDF’s
 UDFs have to be written in Java, the language that Hive itself is written in. For other languages, consider using a
SELECT TRANSFORM query, which allows you to stream data through a user-defined script
 A UDF operates on a single row and produces a single row as its output. Most functions, such as mathematical
functions and string functions, are of this type.
 A UDAF works on multiple input rows and creates a single output row. Aggregate functions include such functions
as COUNT and MAX.
 A UDTF operates on a single row and produces multiple rows—a table—as output.
 A UDF must satisfy the following two properties:
 A UDF must be a subclass of org.apache.hadoop.hive.ql.exec.UDF.
 A UDF must implement at least one evaluate() method.
 CREATE FUNCTION strip AS 'com.hadoopbook.hive.Strip‘ USING JAR '/path/to/hive-examples.jar';
 hive> SELECT strip(' bee ') FROM dummy;
 bee

Weitere ähnliche Inhalte

Was ist angesagt?

Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
ragho
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 

Was ist angesagt? (18)

Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Pig
PigPig
Pig
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
Apache hive
Apache hiveApache hive
Apache hive
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 

Andere mochten auch

Boredom and Eating
Boredom and EatingBoredom and Eating
Boredom and Eating
InnerHelper
 
User-Defined Table Generating Functions
User-Defined Table Generating FunctionsUser-Defined Table Generating Functions
User-Defined Table Generating Functions
pauly1
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
JunHo Cho
 

Andere mochten auch (17)

Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
Boredom and Eating
Boredom and EatingBoredom and Eating
Boredom and Eating
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Hive introduction 介绍
Hive  introduction 介绍Hive  introduction 介绍
Hive introduction 介绍
 
User-Defined Table Generating Functions
User-Defined Table Generating FunctionsUser-Defined Table Generating Functions
User-Defined Table Generating Functions
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
 
Ten tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache HiveTen tools for ten big data areas 04_Apache Hive
Ten tools for ten big data areas 04_Apache Hive
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 
Indexed Hive
Indexed HiveIndexed Hive
Indexed Hive
 
Meet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
Meet Magento Sweden - Magento 2 Layout and Code Compilation for PerformanceMeet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
Meet Magento Sweden - Magento 2 Layout and Code Compilation for Performance
 
Replacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and HiveReplacing Telco DB/DW to Hadoop and Hive
Replacing Telco DB/DW to Hadoop and Hive
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 

Ähnlich wie Advanced topics in hive

DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2
Pranav Prakash
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
ukdpe
 

Ähnlich wie Advanced topics in hive (20)

HivePart1.pptx
HivePart1.pptxHivePart1.pptx
HivePart1.pptx
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
vFabric SQLFire Introduction
vFabric SQLFire IntroductionvFabric SQLFire Introduction
vFabric SQLFire Introduction
 
DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2DB2UDB_the_Basics Day2
DB2UDB_the_Basics Day2
 
Module 3
Module 3Module 3
Module 3
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
SKILLWISE-DB2 DBA
SKILLWISE-DB2 DBASKILLWISE-DB2 DBA
SKILLWISE-DB2 DBA
 
Local storage in Web apps
Local storage in Web appsLocal storage in Web apps
Local storage in Web apps
 
Apache hive
Apache hiveApache hive
Apache hive
 
vFabric SQLFire for high performance data
vFabric SQLFire for high performance datavFabric SQLFire for high performance data
vFabric SQLFire for high performance data
 
Apache hive
Apache hiveApache hive
Apache hive
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
PPT SQL CLASS.pptx
PPT SQL CLASS.pptxPPT SQL CLASS.pptx
PPT SQL CLASS.pptx
 
Graph db as metastore
Graph db as metastoreGraph db as metastore
Graph db as metastore
 
Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005Optimizing Data Accessin Sq Lserver2005
Optimizing Data Accessin Sq Lserver2005
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
 

Mehr von Uday Vakalapudi

Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
Uday Vakalapudi
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data Locality
Uday Vakalapudi
 
Flume basic
Flume basicFlume basic
Flume basic
Uday Vakalapudi
 

Mehr von Uday Vakalapudi (11)

Introduction to pig
Introduction to pigIntroduction to pig
Introduction to pig
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Introduction to hbase
Introduction to hbaseIntroduction to hbase
Introduction to hbase
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Repartition join in mapreduce
Repartition join in mapreduceRepartition join in mapreduce
Repartition join in mapreduce
 
Hadoop Mapreduce joins
Hadoop Mapreduce joinsHadoop Mapreduce joins
Hadoop Mapreduce joins
 
Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2Oozie workflow using HUE 2.2
Oozie workflow using HUE 2.2
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
How Hadoop Exploits Data Locality
How Hadoop Exploits Data LocalityHow Hadoop Exploits Data Locality
How Hadoop Exploits Data Locality
 
Flume basic
Flume basicFlume basic
Flume basic
 

Kürzlich hochgeladen

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Kürzlich hochgeladen (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Advanced topics in hive

  • 2. Agenda  HiveQL: Views  HiveQL: Indexes  Partitions  Bucketing  Hive UDF’s
  • 3. HiveQL:Views  A view is a sort of “virtual table” that is defined by a SELECT statement.  A view allows a query to be saved and treated like a table. It is a logical construct, as it does not store data like a table  Eg:  FROM ( SELECT * FROM people JOIN cart ON (cart.people_id=people.id) WHERE firstname='john' ) a SELECT a.lastname WHERE a.id=3; Create View : - CREATE VIEW shorter_join AS SELECT * FROM people JOIN cart ON (cart.people_id=people.id) WHERE firstname='john'; Final require query: SELECT lastname FROM shorter_join WHERE id=3;
  • 4. HiveQL: Indexes  The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table.  Creating an Index:  CREATE INDEX employees_index ON TABLE employees (country) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD IDXPROPERTIES ('creator = 'me', 'created_at' = 'some_time') IN TABLE employees_index_table PARTITIONED BY (country, name) COMMENT 'Employees indexed by country and name.'; Rebuilding the Index:  ALTER INDEX employees_index ON TABLE employees PARTITION (country = 'US') REBUILD;
  • 5. Partitions  Hive organizes tables into partitions, a way of dividing a table into coarse-grained parts based on the value of a partition column, such as a date, country, state.. Etc.  Imagine logfiles where each record includes a timestamp. If we partitioned by date, then records for the same date would be stored in the same partition.  A table may be partitioned in multiple dimensions  CREATE TABLE logs (ts BIGINT, line STRING) PARTITIONED BY (dt STRING, country STRING);  Load data into a partitioned table:  LOAD DATA LOCAL INPATH 'input/hive/partitions/file1‘ INTO TABLE logs PARTITION (dt='2001-01-01', country='GB');  SHOW PARTITIONS logs;
  • 6. Buckets  Partitions offer a convenient way to segregate data and to optimize queries. However, not all data sets lead to sensible partitioning.  Bucketing is another technique for decomposing data sets into more manageable parts.  For example, suppose a table using the date dt as the top-level partition and the user_id as the second-level partition leads to too many small partitions  CREATE TABLE weblog (user_id INT, url STRING, source_ip STRING) PARTITIONED BY (dt STRING) CLUSTERED BY (user_id) INTO 10 BUCKETS; Sorted bucketing: CREATE TABLE bucketed_users (id INT, name STRING) CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS; Inserting data into bucket table: INSERT OVERWRITE TABLE bucketed_users SELECT * FROM users;
  • 7. Hive UDF’s  UDFs have to be written in Java, the language that Hive itself is written in. For other languages, consider using a SELECT TRANSFORM query, which allows you to stream data through a user-defined script  A UDF operates on a single row and produces a single row as its output. Most functions, such as mathematical functions and string functions, are of this type.  A UDAF works on multiple input rows and creates a single output row. Aggregate functions include such functions as COUNT and MAX.  A UDTF operates on a single row and produces multiple rows—a table—as output.  A UDF must satisfy the following two properties:  A UDF must be a subclass of org.apache.hadoop.hive.ql.exec.UDF.  A UDF must implement at least one evaluate() method.  CREATE FUNCTION strip AS 'com.hadoopbook.hive.Strip‘ USING JAR '/path/to/hive-examples.jar';  hive> SELECT strip(' bee ') FROM dummy;  bee