SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
BIG DATA ANALYTICS
BUSINESS INTELLIGENCE
INFORMATION MANAGEMENT
PERFORMANCE MANAGEMENT
© Copyright 2015 – Keyrus 2
DIVING INTO WEBLOG DATA WITH SAS ON
HADOOP
Lisa Truyers, Data Scientist Consultant at Keyrus
March 24, 2016
Logo
© Copyright 2015 – Keyrus 3
Project summary
WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
© Copyright 2015 – Keyrus 4
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 5
PROS
 Open-source software framework
 Storage and large-scale data processing
 Easy and economic scaling
 Both structured and unstructured data
 Low-cost commodity hardware
 Starts multiple copies of the same task for
the same block of data
What is Hadoop?
51% OF COMPANIES THINKS ABOUT INTEGRATING
HADOOP IN THEIR COMPANY BY 2016
Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
© Copyright 2015 – Keyrus 6
CONS
 Management and high-availability
capabilities are just starting to emerge
 Data security is fragmented
 MapReduce is very batch-oriented
 No easy-to-use, full-feature tools for data
integration, data cleansing, governance
and metadata
 Lacking skilled professionals
What is Hadoop?
MANAGE THE DATA AND USE ANALYTICS TO QUICKLY
IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS
THE DIFFERENT TOOLS OF SAS
© Copyright 2015 – Keyrus 7
WHAT ARE COMPANIES DOING WITH HADOOP?
The percentages mentioned here cover the whole world, not only Europe.
What is Hadoop?
What? Percentage
Data warehouse extensions 46 %
Data exploration and discovery 46 %
Data staging for data warehousing and data integration 39 %
Data lake 39 %
Queryable archive for non-traditional data 36 %
Computational platform and sandbox for advanced analytics 33 %
© Copyright 2015 – Keyrus 8
WHY IS HADOOP (NOT) IMPORTANT?
“Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.”
BI architect, telecom, Europe
“Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive
analytics.”
Vice president, food and beverage, Asia
“Our existing infrastructure cannot handle the tenfold increase in data volumes.”
Data strategy manager, hospitality, US
“It’s important to realize the potential of big data and to explore new business opportunities.”
Data specialist, consulting, Asia
What is Hadoop?
© Copyright 2015 – Keyrus 9
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 10
INTRODUCTION
Project summary
1. Discover web traffic data
• Discover web traffic data
• Sheer volume of data makes it impossible to analyse at the moment
• Prove the added value of a combined Hadoop – SAS environment
2. Lead generation
• More business oriented: scoring a neural network model takes one hour on daily basis
• Reducing this time
© Copyright 2015 – Keyrus 11
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 12
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 13
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 14
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 15
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 16
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 17
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 18
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 19
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 20
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 21
FULL PROCESS
Setup to load data
Day
A Partitioned, non-parsed for day-files
C Partitioned, parsed for day-files
Hour
B Partitioned, non-parsed for hour-files
D Partitioned, parsed for hour-files
© Copyright 2015 – Keyrus 22
Setup to load data
© Copyright 2015 – Keyrus 23
PROCESS C
Setup to load data
Delete HIVE
Table
Transfer to
Hadoop
Parse data Merge Loop
© Copyright 2015 – Keyrus 24
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 25
HADOOP COMPARED TO SERVER
Server
 Query test one day: 35 seconds
 Parsing data on one day: 15 minutes
 Parsing of one week: 4hours 30 minutes
Benchmarks
Hadoop
 Query test on one day: 35 seconds
 Parsing data on one day: 15 minutes
 Parsing of one week: 53 minutes
MORE TIME NEEDED FOR EXTRA BENCHMARKS
© Copyright 2015 – Keyrus 26
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 27
Teamwork is key
• Set-up Hadoop cluster with
Hadoop-experts
• Install SAS with experts from
the company
SAS ON HADOOP
 In SAS, take your time to set the correct
variable length
 Choose the strength of the cluster
rationally
 Create Benchmarks on both environments
(server VS Hadoop) early on so a good
comparison can be done and the correct
decision can be taken
 Data must be large enough on Hadoop to
see a difference
Lessons learned
THANK YOU FOR YOUR ATTENTION
To contact us
www.keyrus.com
contact@keyrus.com

Weitere ähnliche Inhalte

Was ist angesagt?

Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Pentaho
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Pentaho
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015 Pentaho
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer SuccessPentaho
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journeyDataWorks Summit
 
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - CrosspointBI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - CrosspointBICC Thomas More
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
 
Data – The New Raw Material for Business
Data – The New Raw Material for BusinessData – The New Raw Material for Business
Data – The New Raw Material for BusinessCapgemini
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik RoudaSpark Summit
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseRittman Analytics
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?Rackspace
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?SAS Canada
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud
 

Was ist angesagt? (20)

Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
 
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - CrosspointBI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Data – The New Raw Material for Business
Data – The New Raw Material for BusinessData – The New Raw Material for Business
Data – The New Raw Material for Business
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
 

Andere mochten auch

BI congres 2016: programma
BI congres 2016: programmaBI congres 2016: programma
BI congres 2016: programmaBICC Thomas More
 
Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!BICC Thomas More
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist? BICC Thomas More
 
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas MoreBI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas MoreBICC Thomas More
 
Performance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelfPerformance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelfBICC Thomas More
 
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - TeradataBI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - TeradataBICC Thomas More
 

Andere mochten auch (6)

BI congres 2016: programma
BI congres 2016: programmaBI congres 2016: programma
BI congres 2016: programma
 
Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas MoreBI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
 
Performance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelfPerformance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelf
 
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - TeradataBI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
 

Ähnlich wie BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...CA Technologies
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysDataWorks Summit
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...DataWorks Summit
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsMethod360
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
HP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government CloudHP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government CloudIDG Vietnam Public Sector
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchVMware Tanzu
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSUSE Italy
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 

Ähnlich wie BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus (20)

Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
HP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government CloudHP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government Cloud
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service Platform
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 

Kürzlich hochgeladen

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Kürzlich hochgeladen (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

  • 1. BIG DATA ANALYTICS BUSINESS INTELLIGENCE INFORMATION MANAGEMENT PERFORMANCE MANAGEMENT
  • 2. © Copyright 2015 – Keyrus 2 DIVING INTO WEBLOG DATA WITH SAS ON HADOOP Lisa Truyers, Data Scientist Consultant at Keyrus March 24, 2016 Logo
  • 3. © Copyright 2015 – Keyrus 3 Project summary WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
  • 4. © Copyright 2015 – Keyrus 4 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 5. © Copyright 2015 – Keyrus 5 PROS  Open-source software framework  Storage and large-scale data processing  Easy and economic scaling  Both structured and unstructured data  Low-cost commodity hardware  Starts multiple copies of the same task for the same block of data What is Hadoop? 51% OF COMPANIES THINKS ABOUT INTEGRATING HADOOP IN THEIR COMPANY BY 2016 Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
  • 6. © Copyright 2015 – Keyrus 6 CONS  Management and high-availability capabilities are just starting to emerge  Data security is fragmented  MapReduce is very batch-oriented  No easy-to-use, full-feature tools for data integration, data cleansing, governance and metadata  Lacking skilled professionals What is Hadoop? MANAGE THE DATA AND USE ANALYTICS TO QUICKLY IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS THE DIFFERENT TOOLS OF SAS
  • 7. © Copyright 2015 – Keyrus 7 WHAT ARE COMPANIES DOING WITH HADOOP? The percentages mentioned here cover the whole world, not only Europe. What is Hadoop? What? Percentage Data warehouse extensions 46 % Data exploration and discovery 46 % Data staging for data warehousing and data integration 39 % Data lake 39 % Queryable archive for non-traditional data 36 % Computational platform and sandbox for advanced analytics 33 %
  • 8. © Copyright 2015 – Keyrus 8 WHY IS HADOOP (NOT) IMPORTANT? “Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.” BI architect, telecom, Europe “Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive analytics.” Vice president, food and beverage, Asia “Our existing infrastructure cannot handle the tenfold increase in data volumes.” Data strategy manager, hospitality, US “It’s important to realize the potential of big data and to explore new business opportunities.” Data specialist, consulting, Asia What is Hadoop?
  • 9. © Copyright 2015 – Keyrus 9 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 10. © Copyright 2015 – Keyrus 10 INTRODUCTION Project summary 1. Discover web traffic data • Discover web traffic data • Sheer volume of data makes it impossible to analyse at the moment • Prove the added value of a combined Hadoop – SAS environment 2. Lead generation • More business oriented: scoring a neural network model takes one hour on daily basis • Reducing this time
  • 11. © Copyright 2015 – Keyrus 11 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 12. © Copyright 2015 – Keyrus 12 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 13. © Copyright 2015 – Keyrus 13 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 14. © Copyright 2015 – Keyrus 14 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 15. © Copyright 2015 – Keyrus 15 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 16. © Copyright 2015 – Keyrus 16 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 17. © Copyright 2015 – Keyrus 17 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 18. © Copyright 2015 – Keyrus 18 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  • 19. © Copyright 2015 – Keyrus 19 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  • 20. © Copyright 2015 – Keyrus 20 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 21. © Copyright 2015 – Keyrus 21 FULL PROCESS Setup to load data Day A Partitioned, non-parsed for day-files C Partitioned, parsed for day-files Hour B Partitioned, non-parsed for hour-files D Partitioned, parsed for hour-files
  • 22. © Copyright 2015 – Keyrus 22 Setup to load data
  • 23. © Copyright 2015 – Keyrus 23 PROCESS C Setup to load data Delete HIVE Table Transfer to Hadoop Parse data Merge Loop
  • 24. © Copyright 2015 – Keyrus 24 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  • 25. © Copyright 2015 – Keyrus 25 HADOOP COMPARED TO SERVER Server  Query test one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 4hours 30 minutes Benchmarks Hadoop  Query test on one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 53 minutes MORE TIME NEEDED FOR EXTRA BENCHMARKS
  • 26. © Copyright 2015 – Keyrus 26 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  • 27. © Copyright 2015 – Keyrus 27 Teamwork is key • Set-up Hadoop cluster with Hadoop-experts • Install SAS with experts from the company SAS ON HADOOP  In SAS, take your time to set the correct variable length  Choose the strength of the cluster rationally  Create Benchmarks on both environments (server VS Hadoop) early on so a good comparison can be done and the correct decision can be taken  Data must be large enough on Hadoop to see a difference Lessons learned
  • 28. THANK YOU FOR YOUR ATTENTION To contact us www.keyrus.com contact@keyrus.com