SlideShare a Scribd company logo
1 of 32
Download to read offline
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Predictive Analytics and Machine Learning
…with SAS and Apache Hadoop
Spring 2014
Version 1.5
We do Hadoop.
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your speakers…
Ofer Mendelevitch, Director of Data Science
Hortonworks
Wayne Thompson, Chief Data Scientist
SAS
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A data architecture under pressure from new dataAPPLICATIONS	
  DATA	
  	
  SYSTEM	
  
REPOSITORIES	
  
SOURCES	
  
Exis4ng	
  Sources	
  	
  
(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  
RDBMS	
   EDW	
   MPP	
  
Business	
  	
  
Analy4cs	
  
Custom	
  
Applica4ons	
  
Packaged	
  
Applica4ons	
  
Source: IDC
2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  
OLTP,	
  ERP,	
  CRM	
  Systems	
  
Unstructured	
  documents,	
  emails	
  
Clickstream	
  
Server	
  logs	
  
Sen>ment,	
  Web	
  Data	
  
Sensor.	
  Machine	
  Data	
  
Geo-­‐loca>on	
  
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop within an emerging Modern Data Architecture
OPERATIONS	
  TOOLS	
  
Provision,
Manage &
Monitor
DEV	
  &	
  DATA	
  TOOLS	
  
Build &
Test
DATA	
  	
  SYSTEM	
  
REPOSITORIES	
  
SOURCES	
  
RDBMS	
   EDW	
   MPP	
  
OLTP,	
  ERP,	
  
CRM	
  Systems	
  
Documents,	
  	
  
Emails	
  
Web	
  Logs,	
  
Click	
  Streams	
  
Social	
  
Networks	
  
Machine	
  
Generated	
  
Sensor	
  
Data	
  
Geoloca>on	
  
Data	
  
Governance
&Integration
Security
Operations
Data Access
Data Management
APPLICATIONS	
  
Business	
  	
  
Analy4cs	
  
Custom	
  
Applica4ons	
  
Packaged	
  
Applica4ons	
  
Data Lake
An architectural shift in the
data center that uses Hadoop
to deliver deeper insight across
a large, broad, diverse set of
data at efficient scale
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop unlocks a new approach: Iterative Analytics
Hadoop	
  
Mul>ple	
  Query	
  Engines	
  
Itera>ve	
  Process:	
  Explore,	
  Transform,	
  Analyze	
  
SQL	
  
Single	
  Query	
  Engine	
  
Repeatable	
  Linear	
  Process	
  
✚
Determine	
  
list	
  of	
  
ques4ons	
  
Design	
  
solu4ons	
  
Collect	
  
structured	
  
data	
  
Ask	
  
ques4ons	
  
from	
  list	
  
Detect	
  
addi4onal	
  
ques4ons	
  
Batch	
   Interac4ve	
   Real-­‐4me	
   Streaming	
  
Current Reality
Apply schema on write
Dependent on IT
Augment w/ Hadoop
Apply schema on read
Support range of access patterns to data stored in
HDFS: polymorphic access
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Hadoop for Data Science
• Hadoop’s schema on read reduces cycle times
• Hadoop is ideal for pre-processing of raw data
• Improved models with larger datasets
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop’s schema-on-read accelerates innovation
I	
  need	
  new	
  
data	
  
Finally,	
  we	
  
start	
  
collec>ng	
  
Let	
  me	
  see…	
  is	
  
it	
  any	
  good?	
  
Start 6 months 9 months
“Schema change” project
Let’s	
  just	
  put	
  it	
  in	
  a	
  
folder	
  on	
  HDFS	
  
Let	
  me	
  see…	
  is	
  
it	
  any	
  good?	
  
3 months
My	
  model	
  is	
  
awesome!	
  
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop ideal for large scale pre-processing
Join	
  
Normalize	
  
OCR	
  
Sample	
  
Aggregate	
  
Raw	
  Data	
  
Feature	
  
Matrix	
  
NLP	
  
Transform	
  
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Why big data science?
Larger datasets à better outcomes
Banko & Brill, 2001
• More examples
• More features
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A (partial) map of data science “tasks”
Discovery
Clustering
Detect natural groupings
Outlier detection
Detect anomalies
Association rule mining
Co-occurrence patterns
Prediction
Classification
Predict a category
Regression
Predict a value
Recommendation
Predict a preference
Big Data Science: High energy physics, Genomics, etc.
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Typical iterative flow in data science
Page 11
Visualize,
Explore
Hypothesize;
Model
Measure/
Evaluate
Acquire
Data
Clean
Data
Deploy & Monitor
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
SAS in-memory and Visual Statistics
HDP 2.1
Hortonworks Data Platform
	
  	
  
Provision,	
  
Manage	
  &	
  
Monitor	
  
	
  
Ambari	
  
Zookeeper	
  
Scheduling	
  
	
  
Oozie	
  
Data	
  Workflow,	
  
Lifecycle	
  &	
  
Governance	
  
	
  
Falcon	
  
Sqoop	
  
Flume	
  
NFS	
  
WebHDFS	
  
YARN	
  :	
  Data	
  Opera4ng	
  System	
  
DATA	
  	
  MANAGEMENT	
  
SECURITY	
  DATA	
  	
  ACCESS	
  
GOVERNANCE	
  &	
  
INTEGRATION	
  
Authen4ca4on	
  
Authoriza4on	
  
Accoun4ng	
  
Data	
  Protec4on	
  
	
  
Storage:	
  HDFS	
  
Resources:	
  YARN	
  
Access:	
  Hive,	
  …	
  	
  
Pipeline:	
  Falcon	
  
Cluster:	
  Knox	
  
OPERATIONS	
  
Script	
  
	
  
Pig	
  
	
  
	
  
Search	
  
	
  
Solr	
  
	
  
	
  
SQL	
  
	
  
Hive/Tez,	
  
HCatalog	
  
	
  
	
  
NoSQL	
  
	
  
HBase	
  
Accumulo	
  
	
  
	
  
Stream	
  
	
  	
  
Storm	
  
	
  
	
  
	
  
Others	
  
	
  
In-­‐Memory	
  
Analy>cs,	
  	
  
ISV	
  engines	
  
1	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
  
°	
  
N	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
Batch	
  
	
  
Map	
  
Reduce	
  
	
  
	
  
Deployment	
  Choice	
  
Linux Windows On-Premise Cloud
SAS®
Visual Statistics
SAS
®
In-Memory Statistics
for Hadoop
•  Provide powerful
advanced analytics
integrated directly on HDP
Copyright © 2012, SAS Institute Inc. All rights reserved.
BIG ANALYTICS+ HORTONWORKS DATA PLATFORM (HDP) = BIG OPPORTUNITIES
Copyright © 2012, SAS Institute Inc. All rights reserved.
WHAT IS IT?
Provides a single interactive analytical platform on
Hadoop to perform
•  analytical data preparation
•  variable transformations
•  exploratory analysis
•  statistical modeling and machine learning
•  integrated modeling comparison and scoring
•  Takes advantage of distributed in-memory computing
optimized for analytical workloads
TEXT
PREPARE
DATA EXPLORE
DATA
DEVELOP
MODELS
SCORE
SAS
®
IN-MEMORY
ANALYTICS
Governance
&Integration
Security
Operations
Data Access
Data
Management
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS®
IN-
MEMORY
ANALYTICS
INTEGRATED USER EXPERIENCE
Data Preparation
Exploration/
Visualization
Modeling Deployment
DATA SCIENTIST /PROGRAMMER
SAS®
Visual
Statistics SAS
®
In-Memory
Statistics for Hadoop
GUI
GUI
STATISTICIAN
PROGRAMMING
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS IN-MEMORY STATISTICS FOR HADOOP
Data Management
•  Aggregate
•  Compute
•  Update
•  Append
•  Set
•  Schema
•  DeleteRows
•  DropTables
•  PurgeTempTables
Data Exploration
• Boxplot
• Corr
• Crosstab
• Distinct
• Fetch
• Frequency
• Histogram
• KDE
• MDSummary
• Percentile
• Summary
• TopK
Descriptive Modeling
•  Association
•  Path Analysis
•  Clustering (k-means)
•  Clustering (DBSCAN)
Evaluation, Deployment
• Assess
Misclassification matrix
Lift, ROC, Concordance
• Score
• Training / Validation
Data
Management &
Exploration
Modeling
Model
Evaluation &
Deployment
ANALYTICAL
LIFE CYCLE
Utilities
• Where
• GroupBy
• TableInfo, ColumnInfo, ServerInfo
• Partition, Balance
• Store, Replay, Free
• Table, Promote
Text Analytics
•  Parsing
•  SVD
•  Topic generation
•  Document projection
Recommendation Systems
•  Association
•  Clustering
•  kNN
•  SVD
•  Ensemble
Predictive Modeling
• Decision Tree
• Forecast
• Gen Linear Model
• Linear Regression
• Logistic Regression
• Random Forests
HDFS I/O
•  Sasiola
•  Sashdat
•  Anyfile Reader
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS ON HADOOP
MemoryHortonworks Data
Platform
SAS
®
LASR™ Analytic Server
Head
node
Data
Nodes
Data
Data
Data
Data
Edge Node
SAS®	
  Visual	
  
Analy>cs	
  
SAS®	
  Visual	
  
Sta>s>cs	
  
SAS®	
  In-­‐Memory	
  
Sta>s>cs	
  
SAS
®
In-Memory Analytic Products
Web Clients
IN-MEMORY, CLIENT-SERVER, WEB-BASED
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS ON HADOOP
MemoryHortonworks Data
Platform
SAS
®
LASR™ Analytic Server
Head
node
Data
Nodes
Data
Data
Data
Data
Edge Node
SAS®	
  Visual	
  
Analy>cs	
  
SAS®	
  Visual	
  
Sta>s>cs	
  
SAS®	
  In-­‐Memory	
  
Sta>s>cs	
  
SAS
®
In-Memory Analytic Products
Web Clients
IN-MEMORY, CLIENT-SERVER, WEB-BASED
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS ON HADOOP
MemoryHortonworks Data
Platform
SAS
®
LASR™ Analytic Server
Head
node
Data
Nodes
Data
Data
Data
Data
Edge Node
SAS®	
  Visual	
  
Analy>cs	
  
SAS®	
  Visual	
  
Sta>s>cs	
  
SAS®	
  In-­‐Memory	
  
Sta>s>cs	
  
result task
SAS
®
In-Memory Analytic Products
Web Clients
IN-MEMORY, CLIENT-SERVER, WEB-BASED
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS ON HADOOP
broadcasts
SAS
®
LASR™ Analytic Server
Head
node
Data
Nodes
Data
Data
Data
Data
Edge Node
result task
SAS
®
In-Memory Analytic Products
SUMMARY STATISTICS
Web Clients
proc imstat;
table dat1;
summary X / mean;
run;
OUTPUT
Send request
SampleMean(X) to
LASR
Waiting..
Receive ​ 𝑿 
A) Request ​ 𝑺↓𝑿 =∑𝒊↑▒​
𝒙↓𝒊   from data nodes
C) Aggregate ​ 𝑿 =​∑ 𝒋↑▒​
𝑺↓𝑿, 𝒋  ⁄𝑵 
D) Send ​ 𝑿  back to Edge
B) Data node 𝒋 computes ​
𝑺↓𝑿, 𝒋 =∑𝒊↑▒​ 𝒙↓𝒊, 𝒋  ,
𝒋=𝟏,𝟐,𝟑,𝟒
Broadcast..
Memory
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS ON HADOOP
broadcasts
SAS
®
LASR™ Analytic Server
Head
node
Data
Nodes
Data
Data
Data
Data
Edge Node
result task
SAS
®
In-Memory Analytic Products
PRINCIPLES OF THE DESIGN
Web Clients
Thin Clients
Multi-user
Interactive
Real-time
Point-and-click or
programing
Receive requests
from a UI or SAS
program.
•  NO MAP REDUCE
•  One data copy
•  Concurrency
•  Temporary tables or
columns
•  MPP or SMP
Memory
Work on light
computations
(interactive trees)
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case #1: Recommendation systems
Why recommender systems?
•  5 – 20% increase in sales
•  60% use “recommendations” to
determine suitable product
•  In 2011 15% of customers
admitted to buying
recommended products, 2013
nearly 30%
36 Million subscribers
60-70% view results from
recommendation
Tens of Billions “Thumbs up”
60 Million active users
3.8 billion hours of music (last Qtr)
47% up-tic in active users
67% increase in music served
25% YOY Growth
Trip Advisor collaborates with
EBAY, ORBITZ and others.
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Pre-processing raw data for recommendation
• Inputs:
• Explicit product ratings (when provided)
• Implicit information: purchase transactions, page views, comments
5 2 4 ? ?
? ? 5 2 ?
1 2 ? ? 3
? 2 3 1 5
Epic	
  
X-­‐Men	
  
Hobbit	
  
Argo	
  
Pirates	
  
U101	
  
U102	
  
U103	
  
U104	
  
U105	
  
…	
  
Ratings
Page views
Forum
Comments
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Goal: predict a preference
Epic	
  
X-­‐Men	
  
Hobbit	
  
Argo	
  
Pirates	
  
U101	
  
U102	
  
U103	
  
U104	
  
U105	
  
…	
  
Epic	
  
X-­‐Men	
  
Hobbit	
  
Argo	
  
Pirates	
  
5 2 4 ? ?
? ? 5 2 ?
1 2 ? ? 3
? 2 3 1 5
U101	
  
U102	
  
U103	
  
U104	
  
U105	
  
…	
  
5 2 4 1 3
4 1 5 2 3
1 2 4 1 3
3 2 3 1 5
Copyright © 2012, SAS Institute Inc. All rights reserved.
MACHINE LEARNING INTEGRATION
PREDICTIVE
ANALYTICS &
MACHINE LEARNING
RECOMMENDATION SYSTEM DEMO
SAS Visual Analytics
LOUNGE
PUB
BEER
DRINK
GAME
MUSIC
Deployment
PINT
BAND
PLAY
GLASS
Relevant,
Real-time,
Interactions
VODKA
PATIO KARAOKE
COCKTAIL
WINGS
DATA WRANGLING
Data Director*
Convert Json Files
Load LASR
Standardize
SAS In-Memory Statistics
Tony’s
Bar
Trees
Lounge
The
Tropicana
Blue
Parrot
Tony
Patty
George
Users
Business
Beer & Wine
Chinese Food
Mexican Food
LIQUOR
ALCOHOL
BARTENDER
DRAFT
Topics
TAP
FUN
LIVE
SCENE
POOL
Business
REVIEWS
* New SAS Product
Copyright © 2012, SAS Institute Inc. All rights reserved.
PREDICTIVE
ANALYTICS &
MACHINE LEARNING
RECOMMENDATION SYSTEM DEMO
John Clark
Recommendation
History
1. Oyster Bar
2. The Brick
3. Trees Lounge
4. Blue Parrot
5. Winchester Club
6. Starlight Lounge
7. Tony’s Bar
8. Lucy’s
9. The Tropicana
Rank
1
2
3
Recommendation
Review History
1.  Oyster Bar
2.  The Brick
3.  Trees Lounge
4.  Blue Parrot
5.  Winchester Club
6.  Starlight Lounge
7.  Tony’s Bar
8.  Lucy’s
9.  The Tropicana
Rank
1,2, 3, …
Recommendation
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Use Case # 2: Building a prediction model
Customer ID Age Gender Loyalty Card
More
features…
Buys organic
11001 45 M Yes Yes
11002 43 M No Yes
11003 65 F Yes No
… … … …
Unseen data
Model
Buys organic
Labeled Data
Customer ID Age Gender Home
Owner
More
features…
11004 33 M No …
11005 25 F No …
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo #2: Predicting who buys organic products?
•  Dataset: grocery transaction and customer data
•  Goals:
•  Understand customer propensity to buy organic products
•  Develop segments using an interactive decision
•  Develop stratified models to predict organic purchases
•  Why is it useful?
•  Inventory strategy
•  Store layout planning
•  Provider management
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS VISUAL STATISTICS 6.4 – ORGANICS PURCHASE DEMO
PREDICTIVE
ANALYTICS &
MACHINE LEARNING
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Wrap up: SAS and Hortonworks Data Platform
•  Increase productivity for data scientists
•  Users can concurrently & interactively analyze traditional & new data sets in HDP to help
businesses quickly discover and capitalize on new business insights from their data
•  Increase efficiency
•  Avoid unnecessary, multiple passes through the data
•  SAS in-memory infrastructure running on top of Hadoop eliminates costly data movement and
persists data in-memory for the entire analytics session
•  Capture and analyze new data types
•  HDP + SAS enables data scientists to look at more of their enterprise data
•  Leverage 100 percent open-source Apache Hadoop
•  SAS customers can now embrace Hadoop as a core platform in their data architecture
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
How should you get started? Next steps…
•  Get the Data
•  Formulate a well defined business objective
•  Data exploration: integrate and fuse heterogeneous data types
•  Pre-process: generate features from raw data
•  Manage the long-tail distribution and data imbalance
•  Modeling: remember model building is cyclical
•  Evaluate your results
•  Work with IT to move analytics from research and into operations
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
More details..
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about SAS Software & Hortonworks
http://hortonworks.com/partner/SAS/
Contact us: events@hortonworks.com

More Related Content

What's hot

A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitDataWorks Summit
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Hortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Hortonworks
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 

What's hot (20)

A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 

Viewers also liked

Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Longhow Lam
 
Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Poya Manouchehri
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
深度學習(Deep learning)概論- 使用 SAS EM 實做
深度學習(Deep learning)概論- 使用 SAS EM 實做深度學習(Deep learning)概論- 使用 SAS EM 實做
深度學習(Deep learning)概論- 使用 SAS EM 實做SAS TW
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Machine learning and Big Data
Machine learning and Big DataMachine learning and Big Data
Machine learning and Big DataAmr Saleh
 
SAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at ScaleSAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at ScaleCloudera, Inc.
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...Vishal Chowdhary
 
Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processingguest2160992
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Languageguest2160992
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Retail Business Software
Retail Business SoftwareRetail Business Software
Retail Business Softwarejsmith786
 

Viewers also liked (20)

Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 
Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101Predictive Analytics and Machine Learning 101
Predictive Analytics and Machine Learning 101
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
深度學習(Deep learning)概論- 使用 SAS EM 實做
深度學習(Deep learning)概論- 使用 SAS EM 實做深度學習(Deep learning)概論- 使用 SAS EM 實做
深度學習(Deep learning)概論- 使用 SAS EM 實做
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Machine learning and Big Data
Machine learning and Big DataMachine learning and Big Data
Machine learning and Big Data
 
SAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at ScaleSAS and Cloudera – Analytics at Scale
SAS and Cloudera – Analytics at Scale
 
SAS Macros
SAS MacrosSAS Macros
SAS Macros
 
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Machine Learning system architecture – Microsoft Translator, a Case Study :  ...Machine Learning system architecture – Microsoft Translator, a Case Study :  ...
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
 
Understanding SAS Data Step Processing
Understanding SAS Data Step ProcessingUnderstanding SAS Data Step Processing
Understanding SAS Data Step Processing
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
AI 3.0
AI 3.0AI 3.0
AI 3.0
 
Retail Business Software
Retail Business SoftwareRetail Business Software
Retail Business Software
 

Similar to Predictive Analytics and Machine Learning …with SAS and Apache Hadoop

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 

Similar to Predictive Analytics and Machine Learning …with SAS and Apache Hadoop (20)

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

Predictive Analytics and Machine Learning …with SAS and Apache Hadoop

  • 1. Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Predictive Analytics and Machine Learning …with SAS and Apache Hadoop Spring 2014 Version 1.5 We do Hadoop.
  • 2. Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Your speakers… Ofer Mendelevitch, Director of Data Science Hortonworks Wayne Thompson, Chief Data Scientist SAS
  • 3. Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A data architecture under pressure from new dataAPPLICATIONS  DATA    SYSTEM   REPOSITORIES   SOURCES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   RDBMS   EDW   MPP   Business     Analy4cs   Custom   Applica4ons   Packaged   Applica4ons   Source: IDC 2.8  ZB  in  2012   85%  from  New  Data  Types   15x  Machine  Data  by  2020   40  ZB  by  2020   OLTP,  ERP,  CRM  Systems   Unstructured  documents,  emails   Clickstream   Server  logs   Sen>ment,  Web  Data   Sensor.  Machine  Data   Geo-­‐loca>on  
  • 4. Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop within an emerging Modern Data Architecture OPERATIONS  TOOLS   Provision, Manage & Monitor DEV  &  DATA  TOOLS   Build & Test DATA    SYSTEM   REPOSITORIES   SOURCES   RDBMS   EDW   MPP   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   Geoloca>on   Data   Governance &Integration Security Operations Data Access Data Management APPLICATIONS   Business     Analy4cs   Custom   Applica4ons   Packaged   Applica4ons   Data Lake An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale
  • 5. Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop unlocks a new approach: Iterative Analytics Hadoop   Mul>ple  Query  Engines   Itera>ve  Process:  Explore,  Transform,  Analyze   SQL   Single  Query  Engine   Repeatable  Linear  Process   ✚ Determine   list  of   ques4ons   Design   solu4ons   Collect   structured   data   Ask   ques4ons   from  list   Detect   addi4onal   ques4ons   Batch   Interac4ve   Real-­‐4me   Streaming   Current Reality Apply schema on write Dependent on IT Augment w/ Hadoop Apply schema on read Support range of access patterns to data stored in HDFS: polymorphic access
  • 6. Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Hadoop for Data Science • Hadoop’s schema on read reduces cycle times • Hadoop is ideal for pre-processing of raw data • Improved models with larger datasets
  • 7. Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop’s schema-on-read accelerates innovation I  need  new   data   Finally,  we   start   collec>ng   Let  me  see…  is   it  any  good?   Start 6 months 9 months “Schema change” project Let’s  just  put  it  in  a   folder  on  HDFS   Let  me  see…  is   it  any  good?   3 months My  model  is   awesome!  
  • 8. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop ideal for large scale pre-processing Join   Normalize   OCR   Sample   Aggregate   Raw  Data   Feature   Matrix   NLP   Transform  
  • 9. Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Why big data science? Larger datasets à better outcomes Banko & Brill, 2001 • More examples • More features
  • 10. Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A (partial) map of data science “tasks” Discovery Clustering Detect natural groupings Outlier detection Detect anomalies Association rule mining Co-occurrence patterns Prediction Classification Predict a category Regression Predict a value Recommendation Predict a preference Big Data Science: High energy physics, Genomics, etc.
  • 11. Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Typical iterative flow in data science Page 11 Visualize, Explore Hypothesize; Model Measure/ Evaluate Acquire Data Clean Data Deploy & Monitor
  • 12. Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved SAS in-memory and Visual Statistics HDP 2.1 Hortonworks Data Platform     Provision,   Manage  &   Monitor     Ambari   Zookeeper   Scheduling     Oozie   Data  Workflow,   Lifecycle  &   Governance     Falcon   Sqoop   Flume   NFS   WebHDFS   YARN  :  Data  Opera4ng  System   DATA    MANAGEMENT   SECURITY  DATA    ACCESS   GOVERNANCE  &   INTEGRATION   Authen4ca4on   Authoriza4on   Accoun4ng   Data  Protec4on     Storage:  HDFS   Resources:  YARN   Access:  Hive,  …     Pipeline:  Falcon   Cluster:  Knox   OPERATIONS   Script     Pig       Search     Solr       SQL     Hive/Tez,   HCatalog       NoSQL     HBase   Accumulo       Stream       Storm         Others     In-­‐Memory   Analy>cs,     ISV  engines   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   Batch     Map   Reduce       Deployment  Choice   Linux Windows On-Premise Cloud SAS® Visual Statistics SAS ® In-Memory Statistics for Hadoop •  Provide powerful advanced analytics integrated directly on HDP
  • 13. Copyright © 2012, SAS Institute Inc. All rights reserved. BIG ANALYTICS+ HORTONWORKS DATA PLATFORM (HDP) = BIG OPPORTUNITIES
  • 14. Copyright © 2012, SAS Institute Inc. All rights reserved. WHAT IS IT? Provides a single interactive analytical platform on Hadoop to perform •  analytical data preparation •  variable transformations •  exploratory analysis •  statistical modeling and machine learning •  integrated modeling comparison and scoring •  Takes advantage of distributed in-memory computing optimized for analytical workloads TEXT PREPARE DATA EXPLORE DATA DEVELOP MODELS SCORE SAS ® IN-MEMORY ANALYTICS Governance &Integration Security Operations Data Access Data Management
  • 15. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS® IN- MEMORY ANALYTICS INTEGRATED USER EXPERIENCE Data Preparation Exploration/ Visualization Modeling Deployment DATA SCIENTIST /PROGRAMMER SAS® Visual Statistics SAS ® In-Memory Statistics for Hadoop GUI GUI STATISTICIAN PROGRAMMING
  • 16. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS IN-MEMORY STATISTICS FOR HADOOP Data Management •  Aggregate •  Compute •  Update •  Append •  Set •  Schema •  DeleteRows •  DropTables •  PurgeTempTables Data Exploration • Boxplot • Corr • Crosstab • Distinct • Fetch • Frequency • Histogram • KDE • MDSummary • Percentile • Summary • TopK Descriptive Modeling •  Association •  Path Analysis •  Clustering (k-means) •  Clustering (DBSCAN) Evaluation, Deployment • Assess Misclassification matrix Lift, ROC, Concordance • Score • Training / Validation Data Management & Exploration Modeling Model Evaluation & Deployment ANALYTICAL LIFE CYCLE Utilities • Where • GroupBy • TableInfo, ColumnInfo, ServerInfo • Partition, Balance • Store, Replay, Free • Table, Promote Text Analytics •  Parsing •  SVD •  Topic generation •  Document projection Recommendation Systems •  Association •  Clustering •  kNN •  SVD •  Ensemble Predictive Modeling • Decision Tree • Forecast • Gen Linear Model • Linear Regression • Logistic Regression • Random Forests HDFS I/O •  Sasiola •  Sashdat •  Anyfile Reader
  • 17. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ON HADOOP MemoryHortonworks Data Platform SAS ® LASR™ Analytic Server Head node Data Nodes Data Data Data Data Edge Node SAS®  Visual   Analy>cs   SAS®  Visual   Sta>s>cs   SAS®  In-­‐Memory   Sta>s>cs   SAS ® In-Memory Analytic Products Web Clients IN-MEMORY, CLIENT-SERVER, WEB-BASED
  • 18. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ON HADOOP MemoryHortonworks Data Platform SAS ® LASR™ Analytic Server Head node Data Nodes Data Data Data Data Edge Node SAS®  Visual   Analy>cs   SAS®  Visual   Sta>s>cs   SAS®  In-­‐Memory   Sta>s>cs   SAS ® In-Memory Analytic Products Web Clients IN-MEMORY, CLIENT-SERVER, WEB-BASED
  • 19. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ON HADOOP MemoryHortonworks Data Platform SAS ® LASR™ Analytic Server Head node Data Nodes Data Data Data Data Edge Node SAS®  Visual   Analy>cs   SAS®  Visual   Sta>s>cs   SAS®  In-­‐Memory   Sta>s>cs   result task SAS ® In-Memory Analytic Products Web Clients IN-MEMORY, CLIENT-SERVER, WEB-BASED
  • 20. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ON HADOOP broadcasts SAS ® LASR™ Analytic Server Head node Data Nodes Data Data Data Data Edge Node result task SAS ® In-Memory Analytic Products SUMMARY STATISTICS Web Clients proc imstat; table dat1; summary X / mean; run; OUTPUT Send request SampleMean(X) to LASR Waiting.. Receive ​ 𝑿  A) Request ​ 𝑺↓𝑿 =∑𝒊↑▒​ 𝒙↓𝒊   from data nodes C) Aggregate ​ 𝑿 =​∑ 𝒋↑▒​ 𝑺↓𝑿, 𝒋  ⁄𝑵  D) Send ​ 𝑿  back to Edge B) Data node 𝒋 computes ​ 𝑺↓𝑿, 𝒋 =∑𝒊↑▒​ 𝒙↓𝒊, 𝒋  , 𝒋=𝟏,𝟐,𝟑,𝟒 Broadcast.. Memory
  • 21. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ON HADOOP broadcasts SAS ® LASR™ Analytic Server Head node Data Nodes Data Data Data Data Edge Node result task SAS ® In-Memory Analytic Products PRINCIPLES OF THE DESIGN Web Clients Thin Clients Multi-user Interactive Real-time Point-and-click or programing Receive requests from a UI or SAS program. •  NO MAP REDUCE •  One data copy •  Concurrency •  Temporary tables or columns •  MPP or SMP Memory Work on light computations (interactive trees)
  • 22. Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Use Case #1: Recommendation systems Why recommender systems? •  5 – 20% increase in sales •  60% use “recommendations” to determine suitable product •  In 2011 15% of customers admitted to buying recommended products, 2013 nearly 30% 36 Million subscribers 60-70% view results from recommendation Tens of Billions “Thumbs up” 60 Million active users 3.8 billion hours of music (last Qtr) 47% up-tic in active users 67% increase in music served 25% YOY Growth Trip Advisor collaborates with EBAY, ORBITZ and others.
  • 23. Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Pre-processing raw data for recommendation • Inputs: • Explicit product ratings (when provided) • Implicit information: purchase transactions, page views, comments 5 2 4 ? ? ? ? 5 2 ? 1 2 ? ? 3 ? 2 3 1 5 Epic   X-­‐Men   Hobbit   Argo   Pirates   U101   U102   U103   U104   U105   …   Ratings Page views Forum Comments
  • 24. Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Goal: predict a preference Epic   X-­‐Men   Hobbit   Argo   Pirates   U101   U102   U103   U104   U105   …   Epic   X-­‐Men   Hobbit   Argo   Pirates   5 2 4 ? ? ? ? 5 2 ? 1 2 ? ? 3 ? 2 3 1 5 U101   U102   U103   U104   U105   …   5 2 4 1 3 4 1 5 2 3 1 2 4 1 3 3 2 3 1 5
  • 25. Copyright © 2012, SAS Institute Inc. All rights reserved. MACHINE LEARNING INTEGRATION PREDICTIVE ANALYTICS & MACHINE LEARNING RECOMMENDATION SYSTEM DEMO SAS Visual Analytics LOUNGE PUB BEER DRINK GAME MUSIC Deployment PINT BAND PLAY GLASS Relevant, Real-time, Interactions VODKA PATIO KARAOKE COCKTAIL WINGS DATA WRANGLING Data Director* Convert Json Files Load LASR Standardize SAS In-Memory Statistics Tony’s Bar Trees Lounge The Tropicana Blue Parrot Tony Patty George Users Business Beer & Wine Chinese Food Mexican Food LIQUOR ALCOHOL BARTENDER DRAFT Topics TAP FUN LIVE SCENE POOL Business REVIEWS * New SAS Product
  • 26. Copyright © 2012, SAS Institute Inc. All rights reserved. PREDICTIVE ANALYTICS & MACHINE LEARNING RECOMMENDATION SYSTEM DEMO John Clark Recommendation History 1. Oyster Bar 2. The Brick 3. Trees Lounge 4. Blue Parrot 5. Winchester Club 6. Starlight Lounge 7. Tony’s Bar 8. Lucy’s 9. The Tropicana Rank 1 2 3 Recommendation Review History 1.  Oyster Bar 2.  The Brick 3.  Trees Lounge 4.  Blue Parrot 5.  Winchester Club 6.  Starlight Lounge 7.  Tony’s Bar 8.  Lucy’s 9.  The Tropicana Rank 1,2, 3, … Recommendation
  • 27. Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Use Case # 2: Building a prediction model Customer ID Age Gender Loyalty Card More features… Buys organic 11001 45 M Yes Yes 11002 43 M No Yes 11003 65 F Yes No … … … … Unseen data Model Buys organic Labeled Data Customer ID Age Gender Home Owner More features… 11004 33 M No … 11005 25 F No …
  • 28. Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo #2: Predicting who buys organic products? •  Dataset: grocery transaction and customer data •  Goals: •  Understand customer propensity to buy organic products •  Develop segments using an interactive decision •  Develop stratified models to predict organic purchases •  Why is it useful? •  Inventory strategy •  Store layout planning •  Provider management
  • 29. Copyright © 2012, SAS Institute Inc. All rights reserved. SAS VISUAL STATISTICS 6.4 – ORGANICS PURCHASE DEMO PREDICTIVE ANALYTICS & MACHINE LEARNING
  • 30. Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Wrap up: SAS and Hortonworks Data Platform •  Increase productivity for data scientists •  Users can concurrently & interactively analyze traditional & new data sets in HDP to help businesses quickly discover and capitalize on new business insights from their data •  Increase efficiency •  Avoid unnecessary, multiple passes through the data •  SAS in-memory infrastructure running on top of Hadoop eliminates costly data movement and persists data in-memory for the entire analytics session •  Capture and analyze new data types •  HDP + SAS enables data scientists to look at more of their enterprise data •  Leverage 100 percent open-source Apache Hadoop •  SAS customers can now embrace Hadoop as a core platform in their data architecture
  • 31. Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved How should you get started? Next steps… •  Get the Data •  Formulate a well defined business objective •  Data exploration: integrate and fuse heterogeneous data types •  Pre-process: generate features from raw data •  Manage the long-tail distribution and data imbalance •  Modeling: remember model building is cyclical •  Evaluate your results •  Work with IT to move analytics from research and into operations
  • 32. Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved More details.. Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 More about SAS Software & Hortonworks http://hortonworks.com/partner/SAS/ Contact us: events@hortonworks.com