A short introduction to Apache Hadoop Hive, what is it and what can it do. How could we use it to connect a Hadoop cluster to business intelligence tools. Then create management reports from our Hadoop cluster data.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
An introduction to Apache Hadoop Hive
1. Apache Hadoop Hive
● What is it ?
● Architecture
● Related Projects
● Hive DDL
● Hive DML
● HiveQL Examples
● Business Intelligence
2. Hadoop – What is it ?
● A data warehouse for Hadoop
● Open source writen in Java
● Holds meta data in a relational database
● Allows SQL like queries
● Supports “big data” data sets
● Offers built in and user defined functions
● Has indexing
4. Hive – Architecture
● Given an existing HDFS and Hadoop cluster
● Then add Hive and the meta data structure
● Use Flume and Sqoop to move data
● Use Hive LOAD DATA command to load from flat files
● Use ODBC for connectivity to your BI layer
5. Hive – Related Projects
● Apache Flume – move large data sets to Hadoop
● Apache Sqoop – cmd line, move rdbms data to Hadoop
● Apache Hbase – Non relational database
● Apache Pig – analyse large data sets
● Apache Oozie – work flow scheduler
● Apache Mahout – machine learning and data mining
● Apache Hue – Hadoop user interface
● Apache Zoo Keeper – configuration / build
7. Hive - DDL
● Alter table
hive> ALTER TABLE customer ADD COLUMNS ( age INT) ;
● Drop table
hive> DROP TABLE customer;
8. Hive - DML
● Loading flat files into Hive
hive> LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE
INTO TABLE customer;
●
No verification of incoming data
9. HiveQL Examples
● HiveQL, an SQL like language
hive> SELECT a.age FROM customer a WHERE a.sdate ='2008-
08-15';
selects all data from table for a partition but doesnt store it
hive> INSERT OVERWRITE DIRECTORY '/data/hdfs_file'
SELECT a.* FROM customer a WHERE a.sdate='2008-08-15';
writes all of customer table to an hdfs directory
10. Hive – Business Intelligence
● Use ODBC to connect Hive to your BI layer
● Now you can use BI tools like Business Objects
– Create a universe over the Hive instance
– Create reports against the universe
– Create add hoc queries against the universe
11. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems