SlideShare a Scribd company logo
1 of 13
Download to read offline
Final Project

Real Time Analytics

using Cloudera Impala in Manufacturing use case

Rapheephan Thongkham-uan (Nancy)
CSCI E-185 Big Data Analytics

@Rapheephan Thongkham-Uan

Friday, May 10, 13
To make Big Data makes Money
In manufacturing, ...

•

We want to improve the supply chain management by tracking the defective
parts, finding the bottlenecks, etc.

•

We are doing the analysis on the big amount of data using traditional tools that
takes too much time.

•
•

People in the factory are familiar to SQL query.
The faster we analyze the big data,

-

faster defects/bottlenecks detection
near real-time problem solving, decision-making
less time and money spending on the defects

That’s why we need Cloudera Impala
@Rapheephan Thongkham-Uan

Friday, May 10, 13
Requirements

•

Cloudera Manager 4.5.2 installation guide
-

•

•

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Free/latest/ClouderaManager-Free-Edition-Installation-Guide/Cloudera-Manager-Free-Edition-Installation-Guide.html

My VM

-

Ubuntu 12.04 (Precise) 64-bits
CDH 4.2
Cloudera Management 4.5.2

I installed Impala via Cloudera Manager

@Rapheephan Thongkham-Uan

Friday, May 10, 13
After finishing cloudera manager installation
@Rapheephan Thongkham-Uan

Friday, May 10, 13
We will use Hue Web UI to query Impala

From the Services
menu bar, click
HUE1
and choose Hue
Web UI
@Rapheephan Thongkham-Uan

Friday, May 10, 13
Create table in Hive
Create Hive table with user impala then load the data from local into the table
$ sudo -E -u impala hive -e “CREATE TABLE khsample (id INT, sdate
STRING, seq INT, product STRING, ope STRING, resource_grp STRING,
resource STRING, inflow FLOAT, proclot FLOAT, wip FLOAT, ope_rate
FLOAT) ROW FORMAT DELIMITED FILEDS TERMINATED BY ‘,’;”
$ sudo -E -u impala hive -e “LOAD DATA LOCAL INPATH ‘KH_RESULT.csv’
INTO TABLE khsample;”

@Rapheephan Thongkham-Uan

Friday, May 10, 13
Sample table in Hue Web UI
We can view the table we just created in Hive shell on Hue Web UI
*the input data is included japanese characters which cannot be read.

@Rapheephan Thongkham-Uan

Friday, May 10, 13
Create table in Hive
Before querying Impala on Hue Web UI, we have to refresh the Impala first. In
the Impala-shell, input the following command
$ impala-shell
[impala-server:21000] > refresh;

@Rapheephan Thongkham-Uan

Friday, May 10, 13
Query in Impala
In Hue Web UI, click Impala icon the query editor page will be shown.
input the query and execute

@Rapheephan Thongkham-Uan

Friday, May 10, 13
Bottlenecks query
-

To find the groups of machines which are the bottlenecks, we can calculate
from WIP by day. The group of machines which WIP value is higher than the day
before can be predicted as bottleneck.

-

The simulation dates were from 12/13 to 12/22. I will get the summation of
WIP values from the sampling dates (12/14, 12/16, 12/18, 12/20, 12/22).

-

We have to do 5 sub-queries in FROM statement.

@Rapheephan Thongkham-Uan

Friday, May 10, 13
Bottlenecks query (2)
SELECT A.resource_grp,

(SELECT resource_grp, sum(wip) as dwip

A.awip as wip22, --12/22 wip

FROM khsample

B.bwip as wip20, --12/20 wip

WHERE id = 118 and sdate =’”2012/12/16”’) D join

C.cwip as wip18, --12/18 wip

(SELECT resource_grp, sum(wip) as ewip

D.dwip as wip16, --12/16 wip

FROM khsample

D.dwip as wip14 --12/14 wip

WHERE id = 118 and sdate =’”2012/12/14”’) E

FROM (SELECT resource_grp, sum(wip) as awip

WHERE A.resource_grp = B.resource_grp

FROM khsample

and A.resource_grp = C.resource_grp

WHERE id = 118 and sdate =’”2012/12/22”’) A join

and A.resource_grp = D.resource_grp

(SELECT resource_grp, sum(wip) as bwip

and A.resource_grp = E.resource_grp

FROM khsample

and A.awip >= B.bwip and B.bwip >= C.cwip

WHERE id = 118 and sdate =’”2012/12/20”’) B join

and C.cwip >= D.dwip and D.dwip >= E.ewip

(SELECT resource_grp, sum(wip) as cwip

ORDER BY A.awip DESC

FROM khsample

LIMIT 20;

WHERE id = 118 and sdate =’”2012/12/18”’) C join
@Rapheephan Thongkham-Uan

Friday, May 10, 13
Comparing the result of Impala with Oracle SQL

@Rapheephan Thongkham-Uan

Friday, May 10, 13
Results
• join 5 sub-queries in Oracle SQL took 50s.
• join 5 sub-queries in Impala took 6.67s.
• Impala can query 7x faster with the same
results.

• In the real use, we could configure Impala
to work with HBase, also change Hive
Metastore to OracleDB.
@Rapheephan Thongkham-Uan

Friday, May 10, 13

More Related Content

Recently uploaded

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Real Time Analytics using Cloudera Impala in Manufacturing use case

  • 1. Final Project Real Time Analytics using Cloudera Impala in Manufacturing use case Rapheephan Thongkham-uan (Nancy) CSCI E-185 Big Data Analytics @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 2. To make Big Data makes Money In manufacturing, ... • We want to improve the supply chain management by tracking the defective parts, finding the bottlenecks, etc. • We are doing the analysis on the big amount of data using traditional tools that takes too much time. • • People in the factory are familiar to SQL query. The faster we analyze the big data, - faster defects/bottlenecks detection near real-time problem solving, decision-making less time and money spending on the defects That’s why we need Cloudera Impala @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 3. Requirements • Cloudera Manager 4.5.2 installation guide - • • http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Free/latest/ClouderaManager-Free-Edition-Installation-Guide/Cloudera-Manager-Free-Edition-Installation-Guide.html My VM - Ubuntu 12.04 (Precise) 64-bits CDH 4.2 Cloudera Management 4.5.2 I installed Impala via Cloudera Manager @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 4. After finishing cloudera manager installation @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 5. We will use Hue Web UI to query Impala From the Services menu bar, click HUE1 and choose Hue Web UI @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 6. Create table in Hive Create Hive table with user impala then load the data from local into the table $ sudo -E -u impala hive -e “CREATE TABLE khsample (id INT, sdate STRING, seq INT, product STRING, ope STRING, resource_grp STRING, resource STRING, inflow FLOAT, proclot FLOAT, wip FLOAT, ope_rate FLOAT) ROW FORMAT DELIMITED FILEDS TERMINATED BY ‘,’;” $ sudo -E -u impala hive -e “LOAD DATA LOCAL INPATH ‘KH_RESULT.csv’ INTO TABLE khsample;” @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 7. Sample table in Hue Web UI We can view the table we just created in Hive shell on Hue Web UI *the input data is included japanese characters which cannot be read. @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 8. Create table in Hive Before querying Impala on Hue Web UI, we have to refresh the Impala first. In the Impala-shell, input the following command $ impala-shell [impala-server:21000] > refresh; @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 9. Query in Impala In Hue Web UI, click Impala icon the query editor page will be shown. input the query and execute @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 10. Bottlenecks query - To find the groups of machines which are the bottlenecks, we can calculate from WIP by day. The group of machines which WIP value is higher than the day before can be predicted as bottleneck. - The simulation dates were from 12/13 to 12/22. I will get the summation of WIP values from the sampling dates (12/14, 12/16, 12/18, 12/20, 12/22). - We have to do 5 sub-queries in FROM statement. @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 11. Bottlenecks query (2) SELECT A.resource_grp, (SELECT resource_grp, sum(wip) as dwip A.awip as wip22, --12/22 wip FROM khsample B.bwip as wip20, --12/20 wip WHERE id = 118 and sdate =’”2012/12/16”’) D join C.cwip as wip18, --12/18 wip (SELECT resource_grp, sum(wip) as ewip D.dwip as wip16, --12/16 wip FROM khsample D.dwip as wip14 --12/14 wip WHERE id = 118 and sdate =’”2012/12/14”’) E FROM (SELECT resource_grp, sum(wip) as awip WHERE A.resource_grp = B.resource_grp FROM khsample and A.resource_grp = C.resource_grp WHERE id = 118 and sdate =’”2012/12/22”’) A join and A.resource_grp = D.resource_grp (SELECT resource_grp, sum(wip) as bwip and A.resource_grp = E.resource_grp FROM khsample and A.awip >= B.bwip and B.bwip >= C.cwip WHERE id = 118 and sdate =’”2012/12/20”’) B join and C.cwip >= D.dwip and D.dwip >= E.ewip (SELECT resource_grp, sum(wip) as cwip ORDER BY A.awip DESC FROM khsample LIMIT 20; WHERE id = 118 and sdate =’”2012/12/18”’) C join @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 12. Comparing the result of Impala with Oracle SQL @Rapheephan Thongkham-Uan Friday, May 10, 13
  • 13. Results • join 5 sub-queries in Oracle SQL took 50s. • join 5 sub-queries in Impala took 6.67s. • Impala can query 7x faster with the same results. • In the real use, we could configure Impala to work with HBase, also change Hive Metastore to OracleDB. @Rapheephan Thongkham-Uan Friday, May 10, 13