Hive Hadoop

HIVE Introduction
•A cost effective data warehouse style solution
for Hadoop
•Hadoop base
–Cost effective ,very large scale and flexible data
management
•Familiar to huge exiting base of SQL users
•Easy to learn
•No need to write Java data access programs

HIVE Introduction
•SQL like ad-hoc query , aggregation and analysis
of huge volumes of data
–SQL like query language called HiveQL
•Hadoop base for cost effective data
management
–Map/reduce for execution
–Hadoop distributed file system (HDFS) for storage
•JDBC/ODBC access
•Extensible

HIVE Introduction
•Schema on read Vs schema on write improves
flexibility
–Traditional databases enforces schema at load time
–Schema on write
–Hive enforces schema when query is issued
–Schema on read
•Not designed for online transaction processing

Hive architecture
•Components
–Driver
–Metastore
–Interfaces
–Thrift server

Data model – Database and table
•Location in HDFS
–Hive data stored in HDFS under user
/hive/warehouse (default)
•Database
–Namespace the group together related table and
other data units
–Each database is a parent folder in the Hive specific
directory in HDFS

Data model – Database and table
•TABLE
–A collection of related columns
–Can be filtered , projected , joined etc
–Columns types
–Primitives
•TINYINT,INT,BIGINT,BOOLEAN,DOUBLE,STRING
–Array of primitives
–Map of primitives (key value pairs)
–Structure made up of elements of different data types
•Accessed using dot notation
•CREATE TABLE complex_data_type (
–Fruits Array <string>
–Pass_list Map<string,String>
–Car Struct<color:string , Wheel_size :float>);

HiveQL is not SQL
•Not 100% ANSI-Compliant SQL
•Join predicates only support equity operator
•No “inset into”
–Can’t insert into an exsisting table or data partition
–Only supports “insert overwrite “so an insert will always
overwrite the existing data in the whole table or partition
•No “update “or “delete”
•No access control language supported
•Incomplete support for correlated subquery

Hive Benefits
•Bridges the gap between low-level java
programming for hadoop and SQL
•ODBC/JDBC interfaces enable many commercial
business intelligence and ETL
•Leverages Hadoop supports partitioning for
scalability and performance
•Extensible (UDF,SerDe etc.)

Datatypes in HIVE
primitive datatypes
TINYINT
SMALLINT
INT
BIGINT
BOOLEAN
FLOAT
DOUBLE
STRING

Collection data types
STRUCT
struct('John', 'Doe')
MAP
map('first', 'John','last', 'Doe')
ARRAY
array('John', 'Doe')
CREATE TABLE employees (name STRING,
Salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
Address STRUCT<street:STRING, city:STRING, state:STRING,

DATABASES in HIVE
It is a catalog or namespace of tables.Used for avoiding
table name collisions.
SYNTAX:
hive>CREATE DATABASE movies;
you can see the databases that already exist as follows:
hive> SHOW DATABASES;
setting a database as your working database:
hive> USE movies;
If not specified, the default database is used.

Creating Tables
Table creation SYNTAX:
hive>CREATE TABLE movies(id INT,name STRING,
year INT,rating FLOAT,duration FLOAT)
row format delimited fields terminated by '
Showing tables in a database SYNTAX:
hive>SHOW TABLES;
Showing details about the table SYNTAX:
hive>DESCRIBE movies;
Deleting table SYTAX:
hive>DROP TABLE movies;

Managed vs External table
The tables we have created so far are called managed
table(internal tables) and hive Controls life cycle of Data.
Managed tables are less convenient for sharing with
other tools.
We can define an external table that points to that data,
but doesn’t take ownership of it.
SYNTAX:
CREATE EXTERNAL TABLE movies(......)ROW
FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/movies';

Alter Table
Table properties can be altered with this, which
change metadata about the table but not the data itself.
Renaming:
ALTER TABLE movies RENAME TO cinemas;
Adding Columns:
ALTER TABLE movies ADD COLUMNS (language string);
Changing column position:
ALTER TABLE movies
CHANGE COLUMN name names string AFTER year;

Loading data
Hive has no row-level insert, update, and delete
operations, the only way to put data into an table
is to use one of the “bulk” load operations.
From hdfs SYNTAX:
load data inpath '/user/divya/dataset/movie.csv'
into table movies;
From local system SYNTAX:
load data LOCAL inpath '/home/divya/dataset/movie.into table movies;

Hive Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Hive Hadoop

Similar to Hive Hadoop (20)

Recently uploaded

Recently uploaded (20)

Hive Hadoop