SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
CC 2.0 by Mr. T in DC | http://flic.kr/p/7khrin
CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSy
CC 2.0 by John Steven Fernandez | http://flic.kr/p/a8uTzz
CC 2.0 by Ian Carroll | http://flic.kr/p/6NWoGm
CC 2.0 by Perry French | http://flic.kr/p/8wDMJS
CC 2.0 by John Mitchell | http://flic.kr/p/5UaPg8
7
How do we answer these questions?
Before we started designing a blueprint
solution we first of all asked ourselves:
1 Who would be asked to answer questions
like this?
2 Who is this person?
3 What tools does this person expect to
use?
4 And what is a typical skill set of this
person?
5 How do they work?
Preparation
May
21,
2013
8
So, how do we answer these questions as a Data Scientist?
From a high level of abstraction the
answer is simple. We need a data
management system with three pieces:
ingest, store and process.
Traditional Data Management System Approach
May
21,
2013
Data
Source
Data
Ingestion
Data
Processing
Data
Storage
9
So, how do we answer these questions as a Data Scientist?
We take this basis architecture and replace the
generic terms while mapping it onto the Hadoop
ecosystem.
With this Hadoop architecture a Data Scientist should
be able to answer the questions without any
programming environment. He/she can also use
familiar BI, analysis and reporting tools as well.
Blueprint for a Data Management System with Hadoop
May
21,
2013
Data
Source Flume
HIVE,
ImpalaHDFS
BI/Analysis/R
eporting
10
Ingrediants
1 2 WiFi access points to simulate two different stores with
OpenWRT, a linux based firmware for routers, installed
2 Flume to move all log messages to HDFS, without any
manual intervention (no transformation, no filtering)
3 A 4 node CDH4 cluster (2GB RAM, 100GB HDD)
4 Pentaho Data Integration‘s graphical designer for data
transformation, parsing, filtering and loading to the
warehouse
5 Hive as data warehouse system on top of Hadoop to
project structure onto data
6 Impala for querying data from HDFS in real time
7 MS Excel to visualize results
Setup
May
21,
2013
11
How it Works
Analytics System
May
21,
2013
Flume
Hive
Impala
OpenWRT
00:A0:C9:14:C8:28
Syslog Server
Flume
Source
Sinks to
HDFSLoads
RawCSV
Hadoop/HDFS
M/R
Pentaho
UDP
CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfq
13
Visits for stores number one & two
The plot indicates that about 85% of the visits were detected in store
number one and about 15% in store number two. One might draw the
conclusion that store number one is in a much better location with more
occasional customers.
But let’s gain more insights by analysing the number of unique visitors.
Analysis Result
May
21,
2013
14
Unique visitors
This plot gives us more details about the customers. It turns out that
the 135 visits in store number one were caused by just 9 unique
visitors while store number two encountered 5 unique visitors.
Analysis Result
May
21,
2013
15This plot indicates that we have more returning than new users in both
stores. In store number two we didn’t see a new user over the past 4 days at
all.
It’s probably a good idea to start a marketing campaign which aims at new
customers, e.g. to give out vouchers for the first purchase.
New vs. returning users
Analysis Result
May
21,
2013
16The plot for the last 4 days vividly visualizes that the visit duration in
store number one was evenly distributed while the distribution in
store number two shows some peaks.
We can also see that visitors tend to stay in shop number one much
longer.
Visit duration over the past 4 days
Analysis Result
May
21,
2013
17There is a lot of useful information that can be derived
from this plot.
1. There is a repeating pattern of step-ins and step-outs
within a short period of time.
2. There was a step-out of store number one and a step-in
into store number two within just 28 seconds.
Avg. Duration Between Visits of one particular user
Analysis Result
May
21,
2013
Ma
y
21,
201
3
CC 2.0 by Aurelien Guichard | http://flic.kr/p/cjg9yw
19
CCAH Course in ZH
• Cloudera Administrator Training for
Apache Hadoop (CCAH)
• June 26th – 28th 2013
• Limmatstrasse 50, Zurich
• More info's: http://www.ymc.ch/training
Announcement
May
21,
2013
20
Links
1 Presentation, Video and Post Series
• http://bitly.com/bundles/cguegi/1
2 http://www.bigdata-usergroup.ch
3 http://about.me/cguegi
4 http://www.ymc.ch/training
May
21,
2013

Weitere ähnliche Inhalte

Ähnlich wie In-Store Analysis with Hadoop

14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docxaulasnilda
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsRedox Engine
 
Jiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri Ptacek
 
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxdickonsondorris
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018VMware Tanzu
 
Building a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathyBuilding a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathySolmaz Shahalizadeh
 
Orchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal articleOrchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal articleIdea Transfer Inc.
 
1 CS 170 ‐ Computer Applications for Business Fall .docx
1  CS 170 ‐ Computer Applications for Business Fall .docx1  CS 170 ‐ Computer Applications for Business Fall .docx
1 CS 170 ‐ Computer Applications for Business Fall .docxhoney725342
 
B2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyB2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyDivante
 
Alliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the Cloud
Alliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the CloudAlliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the Cloud
Alliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the CloudSparkrock
 
How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...
How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...
How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...AtScale
 
Content marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doingContent marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doingDaniel Smulevich
 
Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...DigitalMarketingShow
 
Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...
Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...
Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...Lora Cecere
 
Leveraging Community Engagement for Brand Engagement, 2012, report
Leveraging Community Engagement for Brand Engagement, 2012, reportLeveraging Community Engagement for Brand Engagement, 2012, report
Leveraging Community Engagement for Brand Engagement, 2012, reportFlorent Renucci
 
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”Elemica
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of ProductProduct School
 
Wedding Hall Management 9975053592
Wedding Hall Management 9975053592Wedding Hall Management 9975053592
Wedding Hall Management 9975053592sachinc020
 

Ähnlich wie In-Store Analysis with Hadoop (20)

In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
 
Applying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise IntegrationsApplying Lessons from API Development to Healthcare Enterprise Integrations
Applying Lessons from API Development to Healthcare Enterprise Integrations
 
Jiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_Certified
 
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docxCopyright © 2014 EMC Corporation. All rights reserved.Copy.docx
Copyright © 2014 EMC Corporation. All rights reserved.Copy.docx
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
Building a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathyBuilding a financial data warehouse: A lesson in empathy
Building a financial data warehouse: A lesson in empathy
 
Orchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal articleOrchestrate Fall 2013 newsletter Alan W. Boal article
Orchestrate Fall 2013 newsletter Alan W. Boal article
 
1 CS 170 ‐ Computer Applications for Business Fall .docx
1  CS 170 ‐ Computer Applications for Business Fall .docx1  CS 170 ‐ Computer Applications for Business Fall .docx
1 CS 170 ‐ Computer Applications for Business Fall .docx
 
B2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyB2B Digital Transformation - Case Study
B2B Digital Transformation - Case Study
 
Alliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the Cloud
Alliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the CloudAlliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the Cloud
Alliance 2017 - How to Plan a Pain-Free Upgrade or Transition to the Cloud
 
How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...
How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...
How to Build Business Forecasts With Microsoft Excel Using 10x the Data at 20...
 
2559 Big Data Pack
2559 Big Data Pack2559 Big Data Pack
2559 Big Data Pack
 
Content marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doingContent marketing analytics: what you should really be doing
Content marketing analytics: what you should really be doing
 
Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...Content Marketing Analytics - What you should really be doing... and probably...
Content Marketing Analytics - What you should really be doing... and probably...
 
Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...
Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...
Digital Supply Chain - Insights on Driving the Digital Supply Chain Transform...
 
Leveraging Community Engagement for Brand Engagement, 2012, report
Leveraging Community Engagement for Brand Engagement, 2012, reportLeveraging Community Engagement for Brand Engagement, 2012, report
Leveraging Community Engagement for Brand Engagement, 2012, report
 
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”
Technology Breakout – Simon Hardy, Elemica: “Next Generation Apps and Analytics”
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of Product
 
Wedding Hall Management 9975053592
Wedding Hall Management 9975053592Wedding Hall Management 9975053592
Wedding Hall Management 9975053592
 

Kürzlich hochgeladen

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

In-Store Analysis with Hadoop

  • 1. CC 2.0 by Mr. T in DC | http://flic.kr/p/7khrin
  • 2. CC 2.0 by Franck BLAIS | http://flic.kr/p/cwVnSy
  • 3. CC 2.0 by John Steven Fernandez | http://flic.kr/p/a8uTzz
  • 4. CC 2.0 by Ian Carroll | http://flic.kr/p/6NWoGm
  • 5. CC 2.0 by Perry French | http://flic.kr/p/8wDMJS
  • 6. CC 2.0 by John Mitchell | http://flic.kr/p/5UaPg8
  • 7. 7 How do we answer these questions? Before we started designing a blueprint solution we first of all asked ourselves: 1 Who would be asked to answer questions like this? 2 Who is this person? 3 What tools does this person expect to use? 4 And what is a typical skill set of this person? 5 How do they work? Preparation May 21, 2013
  • 8. 8 So, how do we answer these questions as a Data Scientist? From a high level of abstraction the answer is simple. We need a data management system with three pieces: ingest, store and process. Traditional Data Management System Approach May 21, 2013 Data Source Data Ingestion Data Processing Data Storage
  • 9. 9 So, how do we answer these questions as a Data Scientist? We take this basis architecture and replace the generic terms while mapping it onto the Hadoop ecosystem. With this Hadoop architecture a Data Scientist should be able to answer the questions without any programming environment. He/she can also use familiar BI, analysis and reporting tools as well. Blueprint for a Data Management System with Hadoop May 21, 2013 Data Source Flume HIVE, ImpalaHDFS BI/Analysis/R eporting
  • 10. 10 Ingrediants 1 2 WiFi access points to simulate two different stores with OpenWRT, a linux based firmware for routers, installed 2 Flume to move all log messages to HDFS, without any manual intervention (no transformation, no filtering) 3 A 4 node CDH4 cluster (2GB RAM, 100GB HDD) 4 Pentaho Data Integration‘s graphical designer for data transformation, parsing, filtering and loading to the warehouse 5 Hive as data warehouse system on top of Hadoop to project structure onto data 6 Impala for querying data from HDFS in real time 7 MS Excel to visualize results Setup May 21, 2013
  • 11. 11 How it Works Analytics System May 21, 2013 Flume Hive Impala OpenWRT 00:A0:C9:14:C8:28 Syslog Server Flume Source Sinks to HDFSLoads RawCSV Hadoop/HDFS M/R Pentaho UDP
  • 12. CC 2.0 by Qi Wei Fong | http://flic.kr/p/7w8vfq
  • 13. 13 Visits for stores number one & two The plot indicates that about 85% of the visits were detected in store number one and about 15% in store number two. One might draw the conclusion that store number one is in a much better location with more occasional customers. But let’s gain more insights by analysing the number of unique visitors. Analysis Result May 21, 2013
  • 14. 14 Unique visitors This plot gives us more details about the customers. It turns out that the 135 visits in store number one were caused by just 9 unique visitors while store number two encountered 5 unique visitors. Analysis Result May 21, 2013
  • 15. 15This plot indicates that we have more returning than new users in both stores. In store number two we didn’t see a new user over the past 4 days at all. It’s probably a good idea to start a marketing campaign which aims at new customers, e.g. to give out vouchers for the first purchase. New vs. returning users Analysis Result May 21, 2013
  • 16. 16The plot for the last 4 days vividly visualizes that the visit duration in store number one was evenly distributed while the distribution in store number two shows some peaks. We can also see that visitors tend to stay in shop number one much longer. Visit duration over the past 4 days Analysis Result May 21, 2013
  • 17. 17There is a lot of useful information that can be derived from this plot. 1. There is a repeating pattern of step-ins and step-outs within a short period of time. 2. There was a step-out of store number one and a step-in into store number two within just 28 seconds. Avg. Duration Between Visits of one particular user Analysis Result May 21, 2013
  • 18. Ma y 21, 201 3 CC 2.0 by Aurelien Guichard | http://flic.kr/p/cjg9yw
  • 19. 19 CCAH Course in ZH • Cloudera Administrator Training for Apache Hadoop (CCAH) • June 26th – 28th 2013 • Limmatstrasse 50, Zurich • More info's: http://www.ymc.ch/training Announcement May 21, 2013
  • 20. 20 Links 1 Presentation, Video and Post Series • http://bitly.com/bundles/cguegi/1 2 http://www.bigdata-usergroup.ch 3 http://about.me/cguegi 4 http://www.ymc.ch/training May 21, 2013