Predicting failure in power networks, detecting fraudulent activities in payment card transactions, and identifying next logical products targeted at the right customer at the right time all require machine learning around massive data sets. This form of artificial intelligence requires complex self-learning algorithms, rapid data iteration for advanced analytics and a robust big data architecture that’s up to the task.
Learn how you can quickly exploit your existing IT infrastructure and scale operations in line with your budget to enjoy advanced data modeling, without having to invest in a large data science team.
25. THE MACHINE LEARNING COMPANY ®
SAME DATA.
BETTER RESULTS.
Jin H. Kim
VP of Marketing
jin@skytree.net!
1
26. THE MACHINE LEARNING COMPANY ®
Machine learning: !
The modern science of finding patterns and making predictions from data:!
!
multivariate statistics, data mining, pattern recognition, advanced/predictive analytics!
Our Vision
2
THE DATA DRIVEN ENTERPRISE
POWERED BY MACHINE LEARNING
27. THE MACHINE LEARNING COMPANY ®
Machine Learning has finally arrived!
50’s-70s Mid 90’s - Today80’s-90’s
3
1st Wave:
Artificial Intelligence
Pattern Recognition
Universities
Technology
Evolution!
Application
Evolution!
2nd Wave:
Neural Networks
Data Mining
Science
Credit scoring
OCR
Now: Machine Learning on Big Data
3rd Wave:
Machine Learning:
Convergence
Sales / Marketing
Finance
Biotech
Retail
Telco
Government
28. THE MACHINE LEARNING COMPANY ®
Skytree: Machine Learning for High-Value,
High-Complexity Problems!
• Predictive optimal decision-making!
– High-frequency algorithmic trading !
– Online advertising exchanges!
– Fast customer targeting and churn
analysis!
• Predictive monitoring/discovery
assistance!
– Point-of-compromise fraud tips/cues !
– Network fault monitoring/diagnosis!
– Predictive maintenance of network of
devices!
– Fraud analysis in claims!
– Insider threat/DLP and cyber security!
4
29. THE MACHINE LEARNING COMPANY ®
High-Value, High-Complexity Problems:
Critical Elements in Common!
1. High-accuracy needed (needle-
finding)!
– Small number of known examples!
– Identify anomalies with no prior examples!
!
2. Complex data fusion needed
(unified objects)!
– Spatial-temporal behavior/event pattern-
finding and tracking!
– Inference of activities, entities/identities,
relations!
3. Automation needed (augment
human analysts)!
– Value-based attention-focusing,
recommendation of relevant content!
– Real-time interactivity without waiting!
– Fast construction of new reports for agility!
5
30. THE MACHINE LEARNING COMPANY ®
Use Case Examples!
6
Financial
Services
Fraud Analysis
Credit Scoring
Pricing
Churn Analysis
SDN/SON
Government
Fraud Analysis
Scoring
Anomaly
Detection
Fault Analysis
SDN/SON
Retail
Segmentation
Recommendation
Churn Analysis
Lead Scoring
Pricing
Asset
Intensive
Preventative
Maintenance
Defect/Fault
Detection
Supply Chain
Management
Cost Forecasting
Failure Analysis
31. THE MACHINE LEARNING COMPANY ®
Global Leaders Select Skytree
WORLD’S
LEADING:
Anomaly detection
Logis3cs
&
Shipping
Content recommendation
Consumer
Electronics
On-board destination recommendation
Automobile
Web
Portal
Ad targeting
Customer lead scoring, fraud, credit risk scoring
Financial
Services
&
Credit
Card
32. THE MACHINE LEARNING COMPANY ®
“10
Hot
Big
Data
Startups
to
Watch”
“Skytree
Looms
in
Big
Data
Forest
with
New
Funding”
“Skytree
Uses
Machine
Learning
To
Crunch
Big
Data”
Skytree
named
“Big
Data
Analy3cs
Vendor
to
Watch”
“The
Ten
Coolest
Big
Data
Startups
in
2013”
“One
giant
leap
for
machinekind”
Skytree
among
“10
Emerging
Technologies
for
Big
Data”
“…could
change
the
face
of
Big
Data”
Who’s
Who
of
Advanced
Analy3cs
33. THE MACHINE LEARNING COMPANY ®
Insurance: Targeted Auto Policies with
Telemetric Data!
• Business challenge!
– Inaccurate policy pricing based on demographics
and actuarial data!
• Example: many teens are good drivers but they often incur
higher premiums !
– Availability of new data sources including
telemetry data !
• Machine learning solution!
– Use telematics to price insurance based on near-
real-time driving habits !
– Base rates on an individual’s actual driving history!
– Data fusion to personalize and increase objectivity
and accuracy in pricing and claims processing!
• Business benefit!
– Targeted customer pricing and policies!
– Improved customer retention!
– Higher customer satisfaction and margins!
9
34. THE MACHINE LEARNING COMPANY ®
• Global 100 Financial Institution!
• Major Pain points: Speed & Accuracy of Current approach!
• Current Solution: SAS, Hadoop, Homegrown!
“I want our analysts to create models
rather than writing software”! - Skytree Customer !
10
Runtime
(minutes)!
CURRENT:!
1,200 Cores @100 Node
Hadoop Cluster!
Runtime: 100 Minutes!
Accuracy (Gini): 57%!
100!
12 Cores @1 Node!
1250x Speedup!
Runtime: 8 Minutes!
Accuracy (Gini): 60%!
SKYTREE SERVER:!
8!
Customers’ Use of Skytree!
Targeting – Find New Customers
35. THE MACHINE LEARNING COMPANY ®
Asset Intensive: Predict Parts Failure through
Telemetric Data!
• Business challenge!
– Early infant mortality of parts due to rapid aging is
not easily detectable during manufacturing and
environmental acceptance tests!
– Utilize diagnostic data such as impedance,
voltage, temperature (multidimensional data)!
• Machine learning solution!
– Detect transient indicators of rapid aging through
telemetric data!
• Time between Beginning of Life and first transient is random!
• Time between first transient and End of Life is deterministic!
– Automatic parameter tuning!
– Data fusion!
• Business benefit!
– Efficient parts inventory management!
– Higher customer satisfaction !
– Optimize preventative maintenance scheduling
based on predicted Time To Failure (TTF)!
11
36. THE MACHINE LEARNING COMPANY ®
Predict Parts Failure through Telemetric Data!
12
Data Stored on Hadoop Cluster
12
Build failure
model from
manufacturing
test data
1
Real-time
discovery of
transient part
behavior
patterns to
predict
Time-To-Failure
Geo-location
Data
Telemetric DataManufacturing
Data
Blend in data
from
telemetric and
other big data
sources
3
2
37. THE MACHINE LEARNING COMPANY ®
Improve Customer Retention with Machine
Learning!
• Business challenge!
– Cost of attracting new customers is many times
more than retaining customers!
– Greater customer sophistication and competition
increase churn levels!
• Machine learning solution!
– Identify events that predict customer needs!
– Isolate best targets and best offers for individual
customers!
• Predict what offer or service would prevent a
customer from switching!
– Discover purchase patterns and profiles of
customer who leave for a deeper understanding!
• Business benefit!
– Reduced churn and increased customer loyalty!
– Increased margins and marketing effectiveness!
– Improved up/cross sell opportunities!
!
13
38. THE MACHINE LEARNING COMPANY ®14 Skytree Confidential
Performance Studies by Customers!
Next Logical Product – Right Offer to Right Customer
• Global Fortune 20 Company!
• Major Pain Points: Speed & Accuracy of Legacy Approach!
• Current Solution: Homegrown!
• 1M Data Points for a “Pilot”!
35% accurate!
20% increase in
recommendation relevance in a
fraction of the time.!
Runtime (mins)!
SKYTREE!
LEGACY!
97! .07!
Results!Precision@5 (%)!
LEGACY!
35%! 42%!
SKYTREE!
“We are literally speechless”! - Skytree Customer !
39. THE MACHINE LEARNING COMPANY ®
Real-Time Fraud Detection!
• Business challenge!
– Growing complexity of fraud patterns!
– Increased frequency of fraud!
– Minimize false positives without compromising
fraud accuracy!
• Machine Learning solution!
– Leverage diverse big data for better context!
– Real-time update of model parameters!
– Faster and more accurate model for better
fraud detection !
• Business benefit!
– More accurate and agile fraud detection
system!
– Improved customer satisfaction !
– Improved financial results!
15
40. THE MACHINE LEARNING COMPANY ®
Global 2000 Credit Card Network – Before!
Transaction Data
Transferred
From Database to
Linux Server
Modeling Fraud Model created to
detect fraud. Model is
exported
Real-timedetection
Model is re-coded by
New set of engineers
for main-frame
New model is “loaded”
fraud could be detected
In Real-time.
• Customer wanted
a more accurate
model
• Current model in
system was
designed to be
updated on a
yearly basis
• Running a model
on large dataset
took over 2 days
• Skytree’s goal is
to move update of
the model to daily
or real time
Hardware: Linux x86 Server, Mainframe
Software: Internally developed random decision forests
SLA: Fraud scored in real-time. Fraud model updated yearly
XX XX
41. THE MACHINE LEARNING COMPANY ®
Global 2000 Credit Card Network - Now!
Modeling&Real-TimeScoreEnvironment
• Customer can
use the same
environment for
modeling and for
production
• Models can be
updated on a
daily or real-
time basis
depending on
needs
• More frequent
updates leads to
significant
increase in lift
Hardware: Linux x86 Server
Software: MapR, Skytree fraud detection models
SLA: Fraud scored in real-time. Fraud model daily / real-time
Data Stored on MapR
Hadoop Cluster
Fraud Model
Created Using
Fraud Model updated
Daily / real-time
Data Stored on MapR
Hadoop Cluster
Unsupervised ML
Models Created Using
Fraud Model updated
Daily / real-time
42. THE MACHINE LEARNING COMPANY ®
“Key to increasing fraud detection accuracy”!
• Use all of the data: Sampling can decrease accuracy of results
• Semi-supervised learning: Combination of supervised and
unsupervised learning can improve fraud detection rates
• Weight transactions based on date: Skytree server allows each
transaction to be weighted differently and allows fraud models to
preferentially weigh recent fraud vs older fraud
• Use the most important variables:
o Were the last few transactions at an un-manned location?
o Is the transaction over the credit limit?
o Which day of the week was the fraud committed?
o Has the card been reported for fraud before?
o And more…
• Weight based on transaction value: we should care more about
larger transactions
Global 2000 Credit Card Network - Now!
43. THE MACHINE LEARNING COMPANY ®19
Skytree Maximizes Predictive Accuracy!
19
Advantages Benefits
Greater chance of having the best
model for your data
Breadth of Advanced Methods: more
powerful/advanced methods and options
1 1
Improved accuracy in the time
available
Speed & Scalability: use more data, test
more parameters
2 2
More productive modelers, more
people in the company can use it
Automation / Ease of Use: shorter time
to most accurate models
3 3
Skytree is designed from the ground up for these benefits.
44. THE MACHINE LEARNING COMPANY ®
Sources of Generalization Error!
20
Motivations: Sources of Generalization Error
Excess Error
Improper
Model
Finite
Samples
Algorithmic
Accuracy
E⇠
⇥
f(xt, ⇠) infx2H⇤ f(x, ⇠)
⇤
E⇠
⇥XXXXXXX
inf
x2H
f(x, ⇠) inf
x2H⇤
f(x, ⇠)
⇤
| {z }
ErrApproximation
E⇠
⇥
⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠) XXXXXXX
inf
x2H
f(x, ⇠)
⇤
| {z }
ErrEstimation
E⇠
⇥
f(xt, ⇠) ⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠)
⇤
| {z }
ErrExpected-Optimization
⇠ : data sample;
N : number of data samples;
H : hypothesis space of the model;
H⇤
: “true” hypothesis space that contains the optimal x⇤
Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8
45. THE MACHINE LEARNING COMPANY ®
First Principles: Sources of prediction error!
21
Motivations: Sources of Generalization Error
Excess Error
Improper
Model
Finite
Samples
Algorithmic
Accuracy
E⇠
⇥
f(xt, ⇠) infx2H⇤ f(x, ⇠)
⇤
E⇠
⇥XXXXXXX
inf
x2H
f(x, ⇠) inf
x2H⇤
f(x, ⇠)
⇤
| {z }
ErrApproximation
E⇠
⇥
⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠) XXXXXXX
inf
x2H
f(x, ⇠)
⇤
| {z }
ErrEstimation
E⇠
⇥
f(xt, ⇠) ⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠)
⇤
| {z }
ErrExpected-Optimization
⇠ : data sample;
N : number of data samples;
H : hypothesis space of the model;
H⇤
: “true” hypothesis space that contains the optimal x⇤
Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8
Use the right model:
Try many
Use more data:
All of it
Use the right parameters:
Try many
46. THE MACHINE LEARNING COMPANY ®
1.x
MAPR Data Platform
Spark
2.x/
YARN
ZooKeeper
Web Services
DataSources/Targets
OLTP / EDW
Command Line Interface
Skytree and Spark!
47. THE MACHINE LEARNING COMPANY ®
Why Skytree?
Why do companies pick us for Big Data analytics?!
23
INVESTORS!
(22M+)!
Built on Solid Foundation
48. THE MACHINE LEARNING COMPANY ®
SAME DATA.
BETTER RESULTS.
Thank You.
www.skytree.net
!
24
49. THE MACHINE LEARNING COMPANY ®
Q&AEngage with us!
1. Download the MapR Sandbox for Hadoop: !
www.mapr.com/sandbox!
!
2. Download machine learning e-books from Ted Dunning:!
http://www.mapr.com/resources/white-papers#e-books !
3. Visit!Skytree at www.skytree.net !
4. Learn best practices for Hadoop ETL:! !www.mapr.com/EDH!
!