SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
Hive - III
Table Partition, HQL
Why partitioning the table is important?
 Data is split into multiple partitions based on the values
of the conditions such as date, city, department etc.
 Data partition increases the efficiency of querying a
table.
 For example, our previous table tb_1 contains ID,
name, location, year. And if we want to retrieve only
the data with year 2010 now our query will search the
whole table for the data related to year 2010.
However if we partition the table with year and store in
a separate file and whenever a table is queried for the
year 2010 it will only read the file partitioned with year
2010 and will ignore the rest partitions. Hence it improves
the query processing time.
Rupak Roy
Create a partitioned Tables
hive> create table empPartitioned
(ID int, name string, location string )
Partitioned by (Year string)
Row format delimited
Fields terminated by ‘#’
Lines terminated by’n’
Stored as textfile
#note: the column values that will be used for partitioning the table must
not be defined in the table definition.
#Load the partitioned data
Hive> load data inpath ‘/home/hduser/dataset/htable2008’ overwrite
into table empPartitioned Partition(year = 2008);
Hive>load data inpath ‘/home/hduser/dataaset/htable2005’ overwrite
into table empPartitioned Partition(year= 2005);
Rupak Roy
Hive> Select * from empPartitioned;
Hive> Select * from empPartitioned
where year = 2005;
Hive> show partition empPartitioned;
Now this query will read only the partition with year
2005 and all other partitions will be ignored.
Rupak Roy
Partitioned External Table
 We can also take the advantage of external
tables for Partitioned Tables and also we don’t
need to specify the ‘ Location ‘ as we did for
external tables.
hive> create external table empPartitioned
(ID int, name string, location string, year
string)
Partitioned by (year string)
Row format delimited
Lines terminated by’#’
Fields terminated by’n’
Stored as textfile;
Rupak Roy
Hive Query Language (HQL)
 HQL inherits the SQL i.e. Structured Query Language to query most of the
tables
Example 1:
Select upper(name), TotalSales/100 as Average
From transactionaldata;
This will give us two columns, one Name in capital letters and the second is the
Average;
Example 2:
Select name, sellingprice – costprice as Profit
Where year = 2010,
And sellingprice > 100
From transactiondata;
#this will give us the profit based on selling price which are more than $100 for
the year 2010
Rupak Roy
We can also use the casting CAST() function to
change the data type to another.
Example 3:
Select name, selling price, CAST( year as int)
from transactionaldata;
Example 4: select CONCAT(name, id),location
Where date= 2005
We can also perform all the SQL queries like inner
joins, outer joins in hive.
Rupak Roy
Hive in RC File
 We can save hive data in different formats. We are
already familiar with the text format (stored as text
file), json, csv, xml and so on. However text format is
more convenient when it comes to sharing data with
other applications but not very effective in terms of
storage.
 Sequential file is another type of format that stores
data effectively by using binary key value pairs but
the drawback is it saves a complete row as a single
binary value. So whenever we query for a single
column hive have to read the full row even if one
column is requested.
 Let’s understand this the help of an example.
Rupak Roy
Create table in sequential file
Create table emp
(ID int, name string, location string)
Row format delimited
Lines terminated by’#’
Fields terminated by’n’
Stored as SEQUENCEFILE;
------------------------------------------
Describe formatted emp;
Rupak Roy
Row Vs Column Storage
 Row Oriented Storage:
Row oriented is efficient when retrieving for all the
columns data. For example from 50 columns & rows
and it realizes that it only has to scan 2 rows.
But when it comes to read only few columns it
needs to read all the rows. Best suits for row data.
ID Name Location Year
11 Bob IN 2005
22 Fara SG 2005
Rupak Roy
Row Vs Column Storage
 Columns Oriented Storage: is the vice versa of
row oriented storage that is best suited when it
comes to reading few columns
ID Name Location Year
11 Bob IN 2005
22 Fara SG 2005
33 Niki JP 2005
44 Steve NZ 2005
Rupak Roy
Record Columnar File
 To address the issue of row oriented storage
RC(Record Columnar ) file format was created.
 Along with the hive, RC file format was also
developed by Facebook.
 RC file stores data on disk in a record columnar
way that splits rows horizontally into row groups.
Row Group 1 Row Group 2
ID Name Location Year
11 Bob IN 2005
22 Fara SG 2005
33 Niki Jp 2005
ID Name Location Year
44 Steve NZ 2005
55 Nina RU 2009
66 Ryan IN 2005
Rupak Roy
Create table empRC
( ID int, name sring, location string)
Stored as RCFile;
----------------
Describe formatted empRC;
-----------------
Load data in hive
Insert overwrite table empRC select * from emp;
-------------------
Now query the table empRC and emp to observe
the difference in time taken to process the request.
Rupak Roy
Next
 Apache Hbase a column oriented non-
relational distributed database
management system.
Rupak Roy
 Stay Tuned.
Rupak Roy

Weitere ähnliche Inhalte

Was ist angesagt?

Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet FormatYue Chen
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-exportFAO
 
Data file handling in python binary & csv files
Data file handling in python binary & csv filesData file handling in python binary & csv files
Data file handling in python binary & csv fileskeeeerty
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig ContinuedAnandMHadoop
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Stream classes in C++
Stream classes in C++Stream classes in C++
Stream classes in C++Shyam Gupta
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
7. Data Import – Data Export
7. Data Import – Data Export7. Data Import – Data Export
7. Data Import – Data ExportFAO
 

Was ist angesagt? (20)

Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Unit 4 lecture-3
Unit 4 lecture-3Unit 4 lecture-3
Unit 4 lecture-3
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
R data-import, data-export
R data-import, data-exportR data-import, data-export
R data-import, data-export
 
Data file handling in python binary & csv files
Data file handling in python binary & csv filesData file handling in python binary & csv files
Data file handling in python binary & csv files
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
Session 04 -Pig Continued
Session 04 -Pig ContinuedSession 04 -Pig Continued
Session 04 -Pig Continued
 
Unit 5-lecture4
Unit 5-lecture4Unit 5-lecture4
Unit 5-lecture4
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Unit 5-lecture-3
Unit 5-lecture-3Unit 5-lecture-3
Unit 5-lecture-3
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
2CPP17 - File IO
2CPP17 - File IO2CPP17 - File IO
2CPP17 - File IO
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
 
Stream classes in C++
Stream classes in C++Stream classes in C++
Stream classes in C++
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
Oracle sql loader utility
Oracle sql loader utilityOracle sql loader utility
Oracle sql loader utility
 
7. Data Import – Data Export
7. Data Import – Data Export7. Data Import – Data Export
7. Data Import – Data Export
 

Ähnlich wie Apache Hive Table Partition and HQL

Hive - ORIEN IT
Hive - ORIEN ITHive - ORIEN IT
Hive - ORIEN ITORIEN IT
 
Database Management Lab -SQL Queries
Database Management Lab -SQL Queries Database Management Lab -SQL Queries
Database Management Lab -SQL Queries shamim hossain
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSRUHULAMINHAZARIKA
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxSreeLaya9
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?ukdpe
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 OverviewEric Nelson
 
R Text-Based Data I/O and Data Frame Access and Manupulation
R Text-Based Data I/O and Data Frame Access and ManupulationR Text-Based Data I/O and Data Frame Access and Manupulation
R Text-Based Data I/O and Data Frame Access and ManupulationIan Cook
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clustermas4share
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataHellen Gakuruh
 

Ähnlich wie Apache Hive Table Partition and HQL (20)

Oracle
OracleOracle
Oracle
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
 
Hive - ORIEN IT
Hive - ORIEN ITHive - ORIEN IT
Hive - ORIEN IT
 
Database Management Lab -SQL Queries
Database Management Lab -SQL Queries Database Management Lab -SQL Queries
Database Management Lab -SQL Queries
 
012. SQL.pdf
012. SQL.pdf012. SQL.pdf
012. SQL.pdf
 
Sql intro & ddl 1
Sql intro & ddl 1Sql intro & ddl 1
Sql intro & ddl 1
 
Sql intro & ddl 1
Sql intro & ddl 1Sql intro & ddl 1
Sql intro & ddl 1
 
012. SQL.pdf
012. SQL.pdf012. SQL.pdf
012. SQL.pdf
 
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICSHive_An Brief Introduction to HIVE_BIGDATAANALYTICS
Hive_An Brief Introduction to HIVE_BIGDATAANALYTICS
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptx
 
Sql fundamentals
Sql fundamentalsSql fundamentals
Sql fundamentals
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
R Text-Based Data I/O and Data Frame Access and Manupulation
R Text-Based Data I/O and Data Frame Access and ManupulationR Text-Based Data I/O and Data Frame Access and Manupulation
R Text-Based Data I/O and Data Frame Access and Manupulation
 
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix clusterFive major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
 
Libsys 7 to koha
Libsys 7 to kohaLibsys 7 to koha
Libsys 7 to koha
 
SessionFive_ImportingandExportingData
SessionFive_ImportingandExportingDataSessionFive_ImportingandExportingData
SessionFive_ImportingandExportingData
 
Basics R.ppt
Basics R.pptBasics R.ppt
Basics R.ppt
 
12 SQL
12 SQL12 SQL
12 SQL
 

Mehr von Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, referenceRupak Roy
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsRupak Roy
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components Rupak Roy
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)Rupak Roy
 

Mehr von Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)
 

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Apache Hive Table Partition and HQL

  • 1. Hive - III Table Partition, HQL
  • 2. Why partitioning the table is important?  Data is split into multiple partitions based on the values of the conditions such as date, city, department etc.  Data partition increases the efficiency of querying a table.  For example, our previous table tb_1 contains ID, name, location, year. And if we want to retrieve only the data with year 2010 now our query will search the whole table for the data related to year 2010. However if we partition the table with year and store in a separate file and whenever a table is queried for the year 2010 it will only read the file partitioned with year 2010 and will ignore the rest partitions. Hence it improves the query processing time. Rupak Roy
  • 3. Create a partitioned Tables hive> create table empPartitioned (ID int, name string, location string ) Partitioned by (Year string) Row format delimited Fields terminated by ‘#’ Lines terminated by’n’ Stored as textfile #note: the column values that will be used for partitioning the table must not be defined in the table definition. #Load the partitioned data Hive> load data inpath ‘/home/hduser/dataset/htable2008’ overwrite into table empPartitioned Partition(year = 2008); Hive>load data inpath ‘/home/hduser/dataaset/htable2005’ overwrite into table empPartitioned Partition(year= 2005); Rupak Roy
  • 4. Hive> Select * from empPartitioned; Hive> Select * from empPartitioned where year = 2005; Hive> show partition empPartitioned; Now this query will read only the partition with year 2005 and all other partitions will be ignored. Rupak Roy
  • 5. Partitioned External Table  We can also take the advantage of external tables for Partitioned Tables and also we don’t need to specify the ‘ Location ‘ as we did for external tables. hive> create external table empPartitioned (ID int, name string, location string, year string) Partitioned by (year string) Row format delimited Lines terminated by’#’ Fields terminated by’n’ Stored as textfile; Rupak Roy
  • 6. Hive Query Language (HQL)  HQL inherits the SQL i.e. Structured Query Language to query most of the tables Example 1: Select upper(name), TotalSales/100 as Average From transactionaldata; This will give us two columns, one Name in capital letters and the second is the Average; Example 2: Select name, sellingprice – costprice as Profit Where year = 2010, And sellingprice > 100 From transactiondata; #this will give us the profit based on selling price which are more than $100 for the year 2010 Rupak Roy
  • 7. We can also use the casting CAST() function to change the data type to another. Example 3: Select name, selling price, CAST( year as int) from transactionaldata; Example 4: select CONCAT(name, id),location Where date= 2005 We can also perform all the SQL queries like inner joins, outer joins in hive. Rupak Roy
  • 8. Hive in RC File  We can save hive data in different formats. We are already familiar with the text format (stored as text file), json, csv, xml and so on. However text format is more convenient when it comes to sharing data with other applications but not very effective in terms of storage.  Sequential file is another type of format that stores data effectively by using binary key value pairs but the drawback is it saves a complete row as a single binary value. So whenever we query for a single column hive have to read the full row even if one column is requested.  Let’s understand this the help of an example. Rupak Roy
  • 9. Create table in sequential file Create table emp (ID int, name string, location string) Row format delimited Lines terminated by’#’ Fields terminated by’n’ Stored as SEQUENCEFILE; ------------------------------------------ Describe formatted emp; Rupak Roy
  • 10. Row Vs Column Storage  Row Oriented Storage: Row oriented is efficient when retrieving for all the columns data. For example from 50 columns & rows and it realizes that it only has to scan 2 rows. But when it comes to read only few columns it needs to read all the rows. Best suits for row data. ID Name Location Year 11 Bob IN 2005 22 Fara SG 2005 Rupak Roy
  • 11. Row Vs Column Storage  Columns Oriented Storage: is the vice versa of row oriented storage that is best suited when it comes to reading few columns ID Name Location Year 11 Bob IN 2005 22 Fara SG 2005 33 Niki JP 2005 44 Steve NZ 2005 Rupak Roy
  • 12. Record Columnar File  To address the issue of row oriented storage RC(Record Columnar ) file format was created.  Along with the hive, RC file format was also developed by Facebook.  RC file stores data on disk in a record columnar way that splits rows horizontally into row groups. Row Group 1 Row Group 2 ID Name Location Year 11 Bob IN 2005 22 Fara SG 2005 33 Niki Jp 2005 ID Name Location Year 44 Steve NZ 2005 55 Nina RU 2009 66 Ryan IN 2005 Rupak Roy
  • 13. Create table empRC ( ID int, name sring, location string) Stored as RCFile; ---------------- Describe formatted empRC; ----------------- Load data in hive Insert overwrite table empRC select * from emp; ------------------- Now query the table empRC and emp to observe the difference in time taken to process the request. Rupak Roy
  • 14. Next  Apache Hbase a column oriented non- relational distributed database management system. Rupak Roy