SlideShare ist ein Scribd-Unternehmen logo
1 von 49
®© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
July 23, 2014
®© 2014 MapR Technologies 2
Our Speakers
Jin Kim
VP, Marketing
Skytree
Nitin Bandugula
Product Marketing
MapR
®© 2014 MapR Technologies 3
Agenda
•  Introduction to Hadoop
•  Machine Learning on Hadoop
•  Advanced Machine Learning
•  Customer Case Studies
®© 2014 MapR Technologies 4
Big Data is Overwhelming Traditional Systems
•  Mission-critical reliability
•  Transaction guarantees
•  Deep security
•  Real-time performance
•  Backup and recovery
•  Interactive SQL
•  Rich analytics
•  Workload management
•  Data governance
•  Backup and recovery
Enterprise
Data
Architecture
ENTERPRISE
USERS
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
PRODUCTION
REQUIREMENTS
PRODUCTION
REQUIREMENTS
OUTSIDE SOURCES
®© 2014 MapR Technologies 5
Hadoop: The Disruptive Technology at the Core of Big Data
JOB TRENDS FROM INDEED.COM
Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
®© 2014 MapR Technologies 6
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
•  Data staging
•  Archive
•  Data transformation
•  Data exploration
•  Streaming,
interactions
Hadoop Relieves the Pressure from Enterprise Systems
2 Interoperability
1 Reliability and DR
4
Supports operations
and analytics
3 High performance
Keys for Production Success
®© 2014 MapR Technologies 7
MapR: Best Hadoop Distribution for Customer Success
Top Ranked
Exponential
Growth
500+
Customers
Premier
Investors
3X bookings Q1 ‘13 – Q1 ‘14
80% of accounts expand 3X
90% software licenses
<1% lifetime churn
>$1B
in incremental revenue
generated by 1 customer
®© 2014 MapR Technologies 8
The Power of the Open Source CommunityManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark
Streaming
Storm*
Streaming
HBase
Solr
NoSQL &
Search
Juju
Provisioning
&
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data
Integration
& Access
HttpFS
Hue
*	
  Cer&fica&on/support	
  planned	
  for	
  2014	
  
®© 2014 MapR Technologies 9
Machine Learning StackManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark
Streaming
Storm*
Streaming
HBase
Solr
NoSQL &
Search
Juju
Provisioning
&
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data
Integration
& Access
HttpFS
Hue
*	
  Cer&fica&on/support	
  planned	
  for	
  2014	
  
®© 2014 MapR Technologies 10
ENTERPRISE
DATA HUB
MARKETING
OPTIMIZATION
RISK & SECURITY
OPTIMIZATION
OPERATIONS
INTELLIGENCE
• Multi-structured
data staging & archive
• ETL / DW optimization
• Mainframe
optimization
• Data exploration
• Recommendation
engines & targeting
• Customer 360
• Click-stream analysis
• Social media analysis
• Ad optimization
• Network security
monitoring
• Security information &
event management
• Fraudulent behavioral
analysis
• Supply chain & logistics
• System log analysis
• Manufacturing quality
assurance
• Preventative
maintenance
• Smart meter analysis
Machine Learning Cuts Across All Use Cases
®© 2014 MapR Technologies 11
How Does Big Data Help Machine Learning
Big Data => Better Models
•  A machine that has played 1 million checkers game will be smarter
than the one that played just a 100 games
•  Improves accuracy of the model esp. for unsupervised learning
•  Unlikely to overfit because of the variety of data
Past Data Model
New Data
Results
®© 2014 MapR Technologies 12
Common Machine Learning Use Cases on Hadoop
•  Linear/Polynomial Regression – fit to an equation - predict prices
•  Logistic Regression – probability of occurrence - classify spam
•  K-means Clustering – group things together - customer
segmentation
•  Recommender Systems and Collaborative Filtering – product
recommendation
•  Anomaly Detection – credit card fraud
The data scientist decides what works best
®© 2014 MapR Technologies 13© 2014 MapR Technologies
®
Machine Learning on Hadoop
®© 2014 MapR Technologies 14
Modeling Process – Constant Iterations / Free to Fail
•  Modeling Data Set + Validation Data Set
•  Constant Iterations and plotting
–  Underfit vs. Overfit
–  Feature manipulation
–  Adjusting learning rates
–  False Positive vs. False Negatives – precision levels
–  Measuring Error etc
•  Legacy applications, libraries, code used to manipulate data
®© 2014 MapR Technologies 15
Development and Deployment Process
Need newer data sets from production for model building and
validation – need complete autonomy for inventions
Develop the final solution based on models and test and deploy
working with Ops – need to coordinate heavily
Need to provide data and deploy apps while ensuring data
consistency, data compliance, HA, DR etc.
PLAYERS ACTIVITY
Mathematicians
Developers
Operations Staff
Lots of Operational Issues
®© 2014 MapR Technologies 16
Volumes and Mirroring
The Conflict:
Experimental, Free to Fail Modeling Process Needs Production Data
Solutions:
1.  Same Cluster: Separate Volumes, Multi-tenancy, Labels, Queues,
Data Placement Control etc..
2. Different Cluster for R&D purposes: Mirroring – efficient, less
network bandwidth, across the globe, easy to deploy and maintain
®© 2014 MapR Technologies 17
Snapshots
The Idea: Version control of data as well as models
Data Version Control:
How does my model work against new validation sets
How did it change across many validation sets
Model Version Control:
How can I go back and check my new model against old datasets
How do I prove that what I came up with worked for the data we had
at the time – replicate scenarios
®© 2014 MapR Technologies 18
Read Write NFS Access
•  Existing applications, custom libraries all work out-of-the-box
•  Browsers, modeling languages, scripts work out-of-the-box
•  Data ingestion is easy
–  Quickly move data in and out without having to wait for developers and
administrators to build and maintain flume cluster
®© 2014 MapR Technologies 19© 2014 MapR Technologies
®
Machine Learning Options
®© 2014 MapR Technologies 20
Apache Spark
•  Spark – In Memory Processing Framework
•  Works well with the iterative machine learning algorithms – the
matrices can be pulled into memory
•  100x better performance (in-memory) compared to MapReduce
MLLib
•  Inbuilt libraries for a variety of algorithms
•  Python and NumPy support
GraphX
•  Libraries to model relationships between entities – social media
®© 2014 MapR Technologies 21
Apache Mahout
•  In-built algorithms for popular techniques such as
Recommenders, Classification, Collaborative Filtering etc.
•  Moving towards running on Spark
®© 2014 MapR Technologies 22
Advanced Machine Learning with Skytree
DATA MARTS DATA WAREHOUSE
MapR Data Platform
Offload
Re-Load
MapR-DB MapR-FS
Batch
(MR, Spark, Hive, Pig,
…)
Interactive
(Impala, Drill, …)
Streaming
(Spark Streaming,
Storm…)
MAPR DISTRIBUTION FOR HADOOP
Adv. Modeling – Exploration - Analytics
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
®© 2014 MapR Technologies 23© 2014 MapR Technologies
®
Skytree
®© 2014 MapR Technologies 24
Q&AEngage with us!
1.  Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox
2. Download machine learning e-books from Ted Dunning:
http://www.mapr.com/resources/white-papers#e-books
3. Visit Skytree at www.skytree.net
4. Learn best practices for Hadoop ETL: www.mapr.com/EDH
THE MACHINE LEARNING COMPANY ®
SAME DATA.
BETTER RESULTS.
Jin H. Kim
VP of Marketing
jin@skytree.net!
1
THE MACHINE LEARNING COMPANY ®
Machine learning: !
The modern science of finding patterns and making predictions from data:!
!
multivariate statistics, data mining, pattern recognition, advanced/predictive analytics!
Our Vision
2
THE DATA DRIVEN ENTERPRISE
POWERED BY MACHINE LEARNING
THE MACHINE LEARNING COMPANY ®
Machine Learning has finally arrived!
50’s-70s Mid 90’s - Today80’s-90’s
3
1st Wave:
Artificial Intelligence
Pattern Recognition
Universities
Technology
Evolution!
Application
Evolution!
2nd Wave:
Neural Networks
Data Mining
Science
Credit scoring
OCR
Now: Machine Learning on Big Data
3rd Wave:
Machine Learning:
Convergence
Sales / Marketing
Finance
Biotech
Retail
Telco
Government
THE MACHINE LEARNING COMPANY ®
Skytree: Machine Learning for High-Value,
High-Complexity Problems!
•  Predictive optimal decision-making!
–  High-frequency algorithmic trading !
–  Online advertising exchanges!
–  Fast customer targeting and churn
analysis!
•  Predictive monitoring/discovery
assistance!
–  Point-of-compromise fraud tips/cues !
–  Network fault monitoring/diagnosis!
–  Predictive maintenance of network of
devices!
–  Fraud analysis in claims!
–  Insider threat/DLP and cyber security!
4
THE MACHINE LEARNING COMPANY ®
High-Value, High-Complexity Problems: 

Critical Elements in Common!
1.  High-accuracy needed (needle-
finding)!
–  Small number of known examples!
–  Identify anomalies with no prior examples!
!
2.  Complex data fusion needed
(unified objects)!
–  Spatial-temporal behavior/event pattern-
finding and tracking!
–  Inference of activities, entities/identities,
relations!
3.  Automation needed (augment
human analysts)!
–  Value-based attention-focusing,
recommendation of relevant content!
–  Real-time interactivity without waiting!
–  Fast construction of new reports for agility!
5
THE MACHINE LEARNING COMPANY ®
Use Case Examples!
6
Financial
Services
Fraud Analysis
Credit Scoring
Pricing
Churn Analysis
SDN/SON
Government
Fraud Analysis
Scoring
Anomaly
Detection
Fault Analysis
SDN/SON
Retail
Segmentation
Recommendation
Churn Analysis
Lead Scoring
Pricing
Asset
Intensive
Preventative
Maintenance
Defect/Fault
Detection
Supply Chain
Management
Cost Forecasting
Failure Analysis
THE MACHINE LEARNING COMPANY ®
Global Leaders Select Skytree
WORLD’S	
  LEADING:	
  
Anomaly detection
Logis3cs	
  &	
  Shipping	
  
Content recommendation
Consumer	
  Electronics	
  
On-board destination recommendation
Automobile	
  
Web	
  Portal	
  
Ad targeting
Customer lead scoring, fraud, credit risk scoring
Financial	
  Services	
  &	
  Credit	
  Card	
  
THE MACHINE LEARNING COMPANY ®
“10	
  Hot	
  Big	
  Data	
  Startups	
  to	
  Watch”	
  
“Skytree	
  Looms	
  in	
  Big	
  Data	
  Forest	
  with	
  New	
  Funding”	
  
	
  
“Skytree	
  Uses	
  Machine	
  Learning	
  To	
  Crunch	
  Big	
  Data”	
  
	
  
Skytree	
  named	
  “Big	
  Data	
  Analy3cs	
  Vendor	
  to	
  Watch”	
  
	
  
“The	
  Ten	
  Coolest	
  Big	
  Data	
  Startups	
  in	
  2013”	
  
	
  
“One	
  giant	
  leap	
  for	
  machinekind”	
  
	
  
Skytree	
  among	
  “10	
  Emerging	
  Technologies	
  for	
  Big	
  Data”	
  
	
  
“…could	
  change	
  the	
  face	
  of	
  Big	
  Data”	
  
Who’s	
  Who	
  of	
  Advanced	
  Analy3cs	
  
THE MACHINE LEARNING COMPANY ®
Insurance: Targeted Auto Policies with
Telemetric Data!
•  Business challenge!
–  Inaccurate policy pricing based on demographics
and actuarial data!
•  Example: many teens are good drivers but they often incur
higher premiums !
–  Availability of new data sources including
telemetry data !
•  Machine learning solution!
–  Use telematics to price insurance based on near-
real-time driving habits !
–  Base rates on an individual’s actual driving history!
–  Data fusion to personalize and increase objectivity
and accuracy in pricing and claims processing!
•  Business benefit!
–  Targeted customer pricing and policies!
–  Improved customer retention!
–  Higher customer satisfaction and margins!
9
THE MACHINE LEARNING COMPANY ®
•  Global 100 Financial Institution!
•  Major Pain points: Speed & Accuracy of Current approach!
•  Current Solution: SAS, Hadoop, Homegrown!
“I want our analysts to create models
rather than writing software”! - Skytree Customer !
10
Runtime 

(minutes)!
CURRENT:!
1,200 Cores @100 Node
Hadoop Cluster!
Runtime: 100 Minutes!
Accuracy (Gini): 57%!
100!
12 Cores @1 Node!
1250x Speedup!
Runtime: 8 Minutes!
Accuracy (Gini): 60%!
SKYTREE SERVER:!
8!
Customers’ Use of Skytree!
Targeting – Find New Customers
THE MACHINE LEARNING COMPANY ®
Asset Intensive: Predict Parts Failure through
Telemetric Data!
•  Business challenge!
–  Early infant mortality of parts due to rapid aging is
not easily detectable during manufacturing and
environmental acceptance tests!
–  Utilize diagnostic data such as impedance,
voltage, temperature (multidimensional data)!
•  Machine learning solution!
–  Detect transient indicators of rapid aging through
telemetric data!
•  Time between Beginning of Life and first transient is random!
•  Time between first transient and End of Life is deterministic!
–  Automatic parameter tuning!
–  Data fusion!
•  Business benefit!
–  Efficient parts inventory management!
–  Higher customer satisfaction !
–  Optimize preventative maintenance scheduling
based on predicted Time To Failure (TTF)!
11
THE MACHINE LEARNING COMPANY ®
Predict Parts Failure through Telemetric Data!
12
Data Stored on Hadoop Cluster
12
Build failure
model from
manufacturing
test data
1
Real-time
discovery of
transient part
behavior
patterns to
predict
Time-To-Failure
Geo-location
Data
Telemetric DataManufacturing
Data
Blend in data
from
telemetric and
other big data
sources
3
2
THE MACHINE LEARNING COMPANY ®
Improve Customer Retention with Machine
Learning!
•  Business challenge!
–  Cost of attracting new customers is many times
more than retaining customers!
–  Greater customer sophistication and competition
increase churn levels!
•  Machine learning solution!
–  Identify events that predict customer needs!
–  Isolate best targets and best offers for individual
customers!
•  Predict what offer or service would prevent a
customer from switching!
–  Discover purchase patterns and profiles of
customer who leave for a deeper understanding!
•  Business benefit!
–  Reduced churn and increased customer loyalty!
–  Increased margins and marketing effectiveness!
–  Improved up/cross sell opportunities!
!
13
THE MACHINE LEARNING COMPANY ®14 Skytree Confidential
Performance Studies by Customers!
Next Logical Product – Right Offer to Right Customer
•  Global Fortune 20 Company!
•  Major Pain Points: Speed & Accuracy of Legacy Approach!
•  Current Solution: Homegrown!
•  1M Data Points for a “Pilot”!
35% accurate!
20% increase in 

recommendation relevance in a
fraction of the time.!
Runtime (mins)!
SKYTREE!
LEGACY!
97! .07!
Results!Precision@5 (%)!
LEGACY!
35%! 42%!
SKYTREE!
“We are literally speechless”! - Skytree Customer !
THE MACHINE LEARNING COMPANY ®
Real-Time Fraud Detection!
•  Business challenge!
–  Growing complexity of fraud patterns!
–  Increased frequency of fraud!
–  Minimize false positives without compromising
fraud accuracy!
•  Machine Learning solution!
–  Leverage diverse big data for better context!
–  Real-time update of model parameters!
–  Faster and more accurate model for better
fraud detection !
•  Business benefit!
–  More accurate and agile fraud detection
system!
–  Improved customer satisfaction !
–  Improved financial results!
15
THE MACHINE LEARNING COMPANY ®
Global 2000 Credit Card Network – Before!
Transaction Data
Transferred
From Database to
Linux Server
Modeling Fraud Model created to
detect fraud. Model is
exported
Real-timedetection
Model is re-coded by
New set of engineers
for main-frame
New model is “loaded”
fraud could be detected
In Real-time.
•  Customer wanted
a more accurate
model
•  Current model in
system was
designed to be
updated on a
yearly basis
•  Running a model
on large dataset
took over 2 days
•  Skytree’s goal is
to move update of
the model to daily
or real time
Hardware: Linux x86 Server, Mainframe
Software: Internally developed random decision forests
SLA: Fraud scored in real-time. Fraud model updated yearly
XX XX
THE MACHINE LEARNING COMPANY ®
Global 2000 Credit Card Network - Now!
Modeling&Real-TimeScoreEnvironment
•  Customer can
use the same
environment for
modeling and for
production
•  Models can be
updated on a
daily or real-
time basis
depending on
needs
•  More frequent
updates leads to
significant
increase in lift
Hardware: Linux x86 Server
Software: MapR, Skytree fraud detection models
SLA: Fraud scored in real-time. Fraud model daily / real-time
Data Stored on MapR
Hadoop Cluster
Fraud Model
Created Using
Fraud Model updated
Daily / real-time
Data Stored on MapR
Hadoop Cluster
Unsupervised ML
Models Created Using
Fraud Model updated
Daily / real-time
THE MACHINE LEARNING COMPANY ®
“Key to increasing fraud detection accuracy”!
•  Use all of the data: Sampling can decrease accuracy of results
•  Semi-supervised learning: Combination of supervised and
unsupervised learning can improve fraud detection rates
•  Weight transactions based on date: Skytree server allows each
transaction to be weighted differently and allows fraud models to
preferentially weigh recent fraud vs older fraud
•  Use the most important variables:
o  Were the last few transactions at an un-manned location?
o  Is the transaction over the credit limit?
o  Which day of the week was the fraud committed?
o  Has the card been reported for fraud before?
o  And more…
•  Weight based on transaction value: we should care more about
larger transactions
Global 2000 Credit Card Network - Now!
THE MACHINE LEARNING COMPANY ®19
Skytree Maximizes Predictive Accuracy!
19
Advantages Benefits
Greater chance of having the best
model for your data
Breadth of Advanced Methods: more
powerful/advanced methods and options
1 1
Improved accuracy in the time
available
Speed & Scalability: use more data, test
more parameters
2 2
More productive modelers, more
people in the company can use it
Automation / Ease of Use: shorter time
to most accurate models
3 3
Skytree is designed from the ground up for these benefits.
THE MACHINE LEARNING COMPANY ®
Sources of Generalization Error!
20
Motivations: Sources of Generalization Error
Excess Error
Improper
Model
Finite
Samples
Algorithmic
Accuracy
E⇠
⇥
f(xt, ⇠) infx2H⇤ f(x, ⇠)
⇤
E⇠
⇥XXXXXXX
inf
x2H
f(x, ⇠) inf
x2H⇤
f(x, ⇠)
⇤
| {z }
ErrApproximation
E⇠
⇥
⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠) XXXXXXX
inf
x2H
f(x, ⇠)
⇤
| {z }
ErrEstimation
E⇠
⇥
f(xt, ⇠) ⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠)
⇤
| {z }
ErrExpected-Optimization
⇠ : data sample;
N : number of data samples;
H : hypothesis space of the model;
H⇤
: “true” hypothesis space that contains the optimal x⇤
Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8
THE MACHINE LEARNING COMPANY ®
First Principles: Sources of prediction error!
21
Motivations: Sources of Generalization Error
Excess Error
Improper
Model
Finite
Samples
Algorithmic
Accuracy
E⇠
⇥
f(xt, ⇠) infx2H⇤ f(x, ⇠)
⇤
E⇠
⇥XXXXXXX
inf
x2H
f(x, ⇠) inf
x2H⇤
f(x, ⇠)
⇤
| {z }
ErrApproximation
E⇠
⇥
⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠) XXXXXXX
inf
x2H
f(x, ⇠)
⇤
| {z }
ErrEstimation
E⇠
⇥
f(xt, ⇠) ⇠⇠⇠⇠⇠⇠
f(x⇤
(N), ⇠)
⇤
| {z }
ErrExpected-Optimization
⇠ : data sample;
N : number of data samples;
H : hypothesis space of the model;
H⇤
: “true” hypothesis space that contains the optimal x⇤
Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8
Use the right model:
Try many
Use more data:
All of it
Use the right parameters:
Try many
THE MACHINE LEARNING COMPANY ®
1.x
MAPR Data Platform
Spark
2.x/
YARN
ZooKeeper
Web Services
DataSources/Targets
OLTP / EDW
Command Line Interface
Skytree and Spark!
THE MACHINE LEARNING COMPANY ®
Why Skytree? 

Why do companies pick us for Big Data analytics?!
23
INVESTORS!
(22M+)!
Built on Solid Foundation
THE MACHINE LEARNING COMPANY ®
SAME DATA.
BETTER RESULTS.
Thank You.
www.skytree.net
!
24
THE MACHINE LEARNING COMPANY ®
Q&AEngage with us!
1.  Download the MapR Sandbox for Hadoop: !
www.mapr.com/sandbox!
!
2. Download machine learning e-books from Ted Dunning:!
http://www.mapr.com/resources/white-papers#e-books !
3. Visit!Skytree at www.skytree.net !
4. Learn best practices for Hadoop ETL:! !www.mapr.com/EDH!
!

Más contenido relacionado

Was ist angesagt?

Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Renato Bonomini
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark Summit
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsDatabricks
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, HortonworksHortonworks
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
 

Was ist angesagt? (20)

Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
 
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing:  Herb Cunitz, HortonworksDemystify Big Data Breakfast Briefing:  Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
MapR 5.2 Product Update
MapR 5.2 Product UpdateMapR 5.2 Product Update
MapR 5.2 Product Update
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 

Andere mochten auch

Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distributionmcsrivas
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションApache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションMapR Technologies Japan
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
20150321 医学:医療者教育研究ネットワーク@九州大学
20150321 医学:医療者教育研究ネットワーク@九州大学20150321 医学:医療者教育研究ネットワーク@九州大学
20150321 医学:医療者教育研究ネットワーク@九州大学Takanori Hiroe
 
20151128_SMeNG_態度は変えられるのか
20151128_SMeNG_態度は変えられるのか20151128_SMeNG_態度は変えられるのか
20151128_SMeNG_態度は変えられるのかTakanori Hiroe
 
HBase New Features
HBase New FeaturesHBase New Features
HBase New Featuresrxu
 
Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析MapR Technologies Japan
 
MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12
MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12
MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12MapR Technologies Japan
 
MapR Streams & MapR コンバージド・データ・プラットフォーム
MapR Streams & MapR コンバージド・データ・プラットフォームMapR Streams & MapR コンバージド・データ・プラットフォーム
MapR Streams & MapR コンバージド・データ・プラットフォームMapR Technologies Japan
 
20170225_Sample size determination
20170225_Sample size determination20170225_Sample size determination
20170225_Sample size determinationTakanori Hiroe
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...ervogler
 
ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...
ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...
ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...MapR Technologies Japan
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Andere mochten auch (20)

Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションApache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
20150321 医学:医療者教育研究ネットワーク@九州大学
20150321 医学:医療者教育研究ネットワーク@九州大学20150321 医学:医療者教育研究ネットワーク@九州大学
20150321 医学:医療者教育研究ネットワーク@九州大学
 
JSME_47th_Nigata
JSME_47th_NigataJSME_47th_Nigata
JSME_47th_Nigata
 
20151128_SMeNG_態度は変えられるのか
20151128_SMeNG_態度は変えられるのか20151128_SMeNG_態度は変えられるのか
20151128_SMeNG_態度は変えられるのか
 
20150827_simplesize
20150827_simplesize20150827_simplesize
20150827_simplesize
 
HBase New Features
HBase New FeaturesHBase New Features
HBase New Features
 
Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析
 
MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12
MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12
MapR アーキテクチャ概要 - MapR CTO Meetup 2013/11/12
 
MapR Streams & MapR コンバージド・データ・プラットフォーム
MapR Streams & MapR コンバージド・データ・プラットフォームMapR Streams & MapR コンバージド・データ・プラットフォーム
MapR Streams & MapR コンバージド・データ・プラットフォーム
 
20170225_Sample size determination
20170225_Sample size determination20170225_Sample size determination
20170225_Sample size determination
 
Drill超簡単チューニング
Drill超簡単チューニングDrill超簡単チューニング
Drill超簡単チューニング
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
Big Data Hadoop Briefing Hosted by Cisco, WWT and MapR: MapR Overview Present...
 
ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...
ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...
ストリーミングアーキテクチャ: State から Flow へ - 2016/02/08 Hadoop / Spark Conference Japan ...
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Ähnlich wie MapR & Skytree:

Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapRThe World Bank
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design PatternsAllen Day, PhD
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014bigdatagurus_meetup
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentMapR Technologies
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise WeAreEsynergy
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Precisely
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR Technologies
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceMapR Technologies
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Data Con LA
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 

Ähnlich wie MapR & Skytree: (20)

Meruvian - Introduction to MapR
Meruvian - Introduction to MapRMeruvian - Introduction to MapR
Meruvian - Introduction to MapR
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environment
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Hadoop In The Real World
Hadoop In The Real WorldHadoop In The Real World
Hadoop In The Real World
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 

MapR & Skytree:

  • 1. ®© 2014 MapR Technologies 1 ® © 2014 MapR Technologies July 23, 2014
  • 2. ®© 2014 MapR Technologies 2 Our Speakers Jin Kim VP, Marketing Skytree Nitin Bandugula Product Marketing MapR
  • 3. ®© 2014 MapR Technologies 3 Agenda •  Introduction to Hadoop •  Machine Learning on Hadoop •  Advanced Machine Learning •  Customer Case Studies
  • 4. ®© 2014 MapR Technologies 4 Big Data is Overwhelming Traditional Systems •  Mission-critical reliability •  Transaction guarantees •  Deep security •  Real-time performance •  Backup and recovery •  Interactive SQL •  Rich analytics •  Workload management •  Data governance •  Backup and recovery Enterprise Data Architecture ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  • 5. ®© 2014 MapR Technologies 5 Hadoop: The Disruptive Technology at the Core of Big Data JOB TRENDS FROM INDEED.COM Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
  • 6. ®© 2014 MapR Technologies 6 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS •  Data staging •  Archive •  Data transformation •  Data exploration •  Streaming, interactions Hadoop Relieves the Pressure from Enterprise Systems 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success
  • 7. ®© 2014 MapR Technologies 7 MapR: Best Hadoop Distribution for Customer Success Top Ranked Exponential Growth 500+ Customers Premier Investors 3X bookings Q1 ‘13 – Q1 ‘14 80% of accounts expand 3X 90% software licenses <1% lifetime churn >$1B in incremental revenue generated by 1 customer
  • 8. ®© 2014 MapR Technologies 8 The Power of the Open Source CommunityManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue *  Cer&fica&on/support  planned  for  2014  
  • 9. ®© 2014 MapR Technologies 9 Machine Learning StackManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue *  Cer&fica&on/support  planned  for  2014  
  • 10. ®© 2014 MapR Technologies 10 ENTERPRISE DATA HUB MARKETING OPTIMIZATION RISK & SECURITY OPTIMIZATION OPERATIONS INTELLIGENCE • Multi-structured data staging & archive • ETL / DW optimization • Mainframe optimization • Data exploration • Recommendation engines & targeting • Customer 360 • Click-stream analysis • Social media analysis • Ad optimization • Network security monitoring • Security information & event management • Fraudulent behavioral analysis • Supply chain & logistics • System log analysis • Manufacturing quality assurance • Preventative maintenance • Smart meter analysis Machine Learning Cuts Across All Use Cases
  • 11. ®© 2014 MapR Technologies 11 How Does Big Data Help Machine Learning Big Data => Better Models •  A machine that has played 1 million checkers game will be smarter than the one that played just a 100 games •  Improves accuracy of the model esp. for unsupervised learning •  Unlikely to overfit because of the variety of data Past Data Model New Data Results
  • 12. ®© 2014 MapR Technologies 12 Common Machine Learning Use Cases on Hadoop •  Linear/Polynomial Regression – fit to an equation - predict prices •  Logistic Regression – probability of occurrence - classify spam •  K-means Clustering – group things together - customer segmentation •  Recommender Systems and Collaborative Filtering – product recommendation •  Anomaly Detection – credit card fraud The data scientist decides what works best
  • 13. ®© 2014 MapR Technologies 13© 2014 MapR Technologies ® Machine Learning on Hadoop
  • 14. ®© 2014 MapR Technologies 14 Modeling Process – Constant Iterations / Free to Fail •  Modeling Data Set + Validation Data Set •  Constant Iterations and plotting –  Underfit vs. Overfit –  Feature manipulation –  Adjusting learning rates –  False Positive vs. False Negatives – precision levels –  Measuring Error etc •  Legacy applications, libraries, code used to manipulate data
  • 15. ®© 2014 MapR Technologies 15 Development and Deployment Process Need newer data sets from production for model building and validation – need complete autonomy for inventions Develop the final solution based on models and test and deploy working with Ops – need to coordinate heavily Need to provide data and deploy apps while ensuring data consistency, data compliance, HA, DR etc. PLAYERS ACTIVITY Mathematicians Developers Operations Staff Lots of Operational Issues
  • 16. ®© 2014 MapR Technologies 16 Volumes and Mirroring The Conflict: Experimental, Free to Fail Modeling Process Needs Production Data Solutions: 1.  Same Cluster: Separate Volumes, Multi-tenancy, Labels, Queues, Data Placement Control etc.. 2. Different Cluster for R&D purposes: Mirroring – efficient, less network bandwidth, across the globe, easy to deploy and maintain
  • 17. ®© 2014 MapR Technologies 17 Snapshots The Idea: Version control of data as well as models Data Version Control: How does my model work against new validation sets How did it change across many validation sets Model Version Control: How can I go back and check my new model against old datasets How do I prove that what I came up with worked for the data we had at the time – replicate scenarios
  • 18. ®© 2014 MapR Technologies 18 Read Write NFS Access •  Existing applications, custom libraries all work out-of-the-box •  Browsers, modeling languages, scripts work out-of-the-box •  Data ingestion is easy –  Quickly move data in and out without having to wait for developers and administrators to build and maintain flume cluster
  • 19. ®© 2014 MapR Technologies 19© 2014 MapR Technologies ® Machine Learning Options
  • 20. ®© 2014 MapR Technologies 20 Apache Spark •  Spark – In Memory Processing Framework •  Works well with the iterative machine learning algorithms – the matrices can be pulled into memory •  100x better performance (in-memory) compared to MapReduce MLLib •  Inbuilt libraries for a variety of algorithms •  Python and NumPy support GraphX •  Libraries to model relationships between entities – social media
  • 21. ®© 2014 MapR Technologies 21 Apache Mahout •  In-built algorithms for popular techniques such as Recommenders, Classification, Collaborative Filtering etc. •  Moving towards running on Spark
  • 22. ®© 2014 MapR Technologies 22 Advanced Machine Learning with Skytree DATA MARTS DATA WAREHOUSE MapR Data Platform Offload Re-Load MapR-DB MapR-FS Batch (MR, Spark, Hive, Pig, …) Interactive (Impala, Drill, …) Streaming (Spark Streaming, Storm…) MAPR DISTRIBUTION FOR HADOOP Adv. Modeling – Exploration - Analytics Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA
  • 23. ®© 2014 MapR Technologies 23© 2014 MapR Technologies ® Skytree
  • 24. ®© 2014 MapR Technologies 24 Q&AEngage with us! 1.  Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox 2. Download machine learning e-books from Ted Dunning: http://www.mapr.com/resources/white-papers#e-books 3. Visit Skytree at www.skytree.net 4. Learn best practices for Hadoop ETL: www.mapr.com/EDH
  • 25. THE MACHINE LEARNING COMPANY ® SAME DATA. BETTER RESULTS. Jin H. Kim VP of Marketing jin@skytree.net! 1
  • 26. THE MACHINE LEARNING COMPANY ® Machine learning: ! The modern science of finding patterns and making predictions from data:! ! multivariate statistics, data mining, pattern recognition, advanced/predictive analytics! Our Vision 2 THE DATA DRIVEN ENTERPRISE POWERED BY MACHINE LEARNING
  • 27. THE MACHINE LEARNING COMPANY ® Machine Learning has finally arrived! 50’s-70s Mid 90’s - Today80’s-90’s 3 1st Wave: Artificial Intelligence Pattern Recognition Universities Technology Evolution! Application Evolution! 2nd Wave: Neural Networks Data Mining Science Credit scoring OCR Now: Machine Learning on Big Data 3rd Wave: Machine Learning: Convergence Sales / Marketing Finance Biotech Retail Telco Government
  • 28. THE MACHINE LEARNING COMPANY ® Skytree: Machine Learning for High-Value, High-Complexity Problems! •  Predictive optimal decision-making! –  High-frequency algorithmic trading ! –  Online advertising exchanges! –  Fast customer targeting and churn analysis! •  Predictive monitoring/discovery assistance! –  Point-of-compromise fraud tips/cues ! –  Network fault monitoring/diagnosis! –  Predictive maintenance of network of devices! –  Fraud analysis in claims! –  Insider threat/DLP and cyber security! 4
  • 29. THE MACHINE LEARNING COMPANY ® High-Value, High-Complexity Problems: 
 Critical Elements in Common! 1.  High-accuracy needed (needle- finding)! –  Small number of known examples! –  Identify anomalies with no prior examples! ! 2.  Complex data fusion needed (unified objects)! –  Spatial-temporal behavior/event pattern- finding and tracking! –  Inference of activities, entities/identities, relations! 3.  Automation needed (augment human analysts)! –  Value-based attention-focusing, recommendation of relevant content! –  Real-time interactivity without waiting! –  Fast construction of new reports for agility! 5
  • 30. THE MACHINE LEARNING COMPANY ® Use Case Examples! 6 Financial Services Fraud Analysis Credit Scoring Pricing Churn Analysis SDN/SON Government Fraud Analysis Scoring Anomaly Detection Fault Analysis SDN/SON Retail Segmentation Recommendation Churn Analysis Lead Scoring Pricing Asset Intensive Preventative Maintenance Defect/Fault Detection Supply Chain Management Cost Forecasting Failure Analysis
  • 31. THE MACHINE LEARNING COMPANY ® Global Leaders Select Skytree WORLD’S  LEADING:   Anomaly detection Logis3cs  &  Shipping   Content recommendation Consumer  Electronics   On-board destination recommendation Automobile   Web  Portal   Ad targeting Customer lead scoring, fraud, credit risk scoring Financial  Services  &  Credit  Card  
  • 32. THE MACHINE LEARNING COMPANY ® “10  Hot  Big  Data  Startups  to  Watch”   “Skytree  Looms  in  Big  Data  Forest  with  New  Funding”     “Skytree  Uses  Machine  Learning  To  Crunch  Big  Data”     Skytree  named  “Big  Data  Analy3cs  Vendor  to  Watch”     “The  Ten  Coolest  Big  Data  Startups  in  2013”     “One  giant  leap  for  machinekind”     Skytree  among  “10  Emerging  Technologies  for  Big  Data”     “…could  change  the  face  of  Big  Data”   Who’s  Who  of  Advanced  Analy3cs  
  • 33. THE MACHINE LEARNING COMPANY ® Insurance: Targeted Auto Policies with Telemetric Data! •  Business challenge! –  Inaccurate policy pricing based on demographics and actuarial data! •  Example: many teens are good drivers but they often incur higher premiums ! –  Availability of new data sources including telemetry data ! •  Machine learning solution! –  Use telematics to price insurance based on near- real-time driving habits ! –  Base rates on an individual’s actual driving history! –  Data fusion to personalize and increase objectivity and accuracy in pricing and claims processing! •  Business benefit! –  Targeted customer pricing and policies! –  Improved customer retention! –  Higher customer satisfaction and margins! 9
  • 34. THE MACHINE LEARNING COMPANY ® •  Global 100 Financial Institution! •  Major Pain points: Speed & Accuracy of Current approach! •  Current Solution: SAS, Hadoop, Homegrown! “I want our analysts to create models rather than writing software”! - Skytree Customer ! 10 Runtime 
 (minutes)! CURRENT:! 1,200 Cores @100 Node Hadoop Cluster! Runtime: 100 Minutes! Accuracy (Gini): 57%! 100! 12 Cores @1 Node! 1250x Speedup! Runtime: 8 Minutes! Accuracy (Gini): 60%! SKYTREE SERVER:! 8! Customers’ Use of Skytree! Targeting – Find New Customers
  • 35. THE MACHINE LEARNING COMPANY ® Asset Intensive: Predict Parts Failure through Telemetric Data! •  Business challenge! –  Early infant mortality of parts due to rapid aging is not easily detectable during manufacturing and environmental acceptance tests! –  Utilize diagnostic data such as impedance, voltage, temperature (multidimensional data)! •  Machine learning solution! –  Detect transient indicators of rapid aging through telemetric data! •  Time between Beginning of Life and first transient is random! •  Time between first transient and End of Life is deterministic! –  Automatic parameter tuning! –  Data fusion! •  Business benefit! –  Efficient parts inventory management! –  Higher customer satisfaction ! –  Optimize preventative maintenance scheduling based on predicted Time To Failure (TTF)! 11
  • 36. THE MACHINE LEARNING COMPANY ® Predict Parts Failure through Telemetric Data! 12 Data Stored on Hadoop Cluster 12 Build failure model from manufacturing test data 1 Real-time discovery of transient part behavior patterns to predict Time-To-Failure Geo-location Data Telemetric DataManufacturing Data Blend in data from telemetric and other big data sources 3 2
  • 37. THE MACHINE LEARNING COMPANY ® Improve Customer Retention with Machine Learning! •  Business challenge! –  Cost of attracting new customers is many times more than retaining customers! –  Greater customer sophistication and competition increase churn levels! •  Machine learning solution! –  Identify events that predict customer needs! –  Isolate best targets and best offers for individual customers! •  Predict what offer or service would prevent a customer from switching! –  Discover purchase patterns and profiles of customer who leave for a deeper understanding! •  Business benefit! –  Reduced churn and increased customer loyalty! –  Increased margins and marketing effectiveness! –  Improved up/cross sell opportunities! ! 13
  • 38. THE MACHINE LEARNING COMPANY ®14 Skytree Confidential Performance Studies by Customers! Next Logical Product – Right Offer to Right Customer •  Global Fortune 20 Company! •  Major Pain Points: Speed & Accuracy of Legacy Approach! •  Current Solution: Homegrown! •  1M Data Points for a “Pilot”! 35% accurate! 20% increase in 
 recommendation relevance in a fraction of the time.! Runtime (mins)! SKYTREE! LEGACY! 97! .07! Results!Precision@5 (%)! LEGACY! 35%! 42%! SKYTREE! “We are literally speechless”! - Skytree Customer !
  • 39. THE MACHINE LEARNING COMPANY ® Real-Time Fraud Detection! •  Business challenge! –  Growing complexity of fraud patterns! –  Increased frequency of fraud! –  Minimize false positives without compromising fraud accuracy! •  Machine Learning solution! –  Leverage diverse big data for better context! –  Real-time update of model parameters! –  Faster and more accurate model for better fraud detection ! •  Business benefit! –  More accurate and agile fraud detection system! –  Improved customer satisfaction ! –  Improved financial results! 15
  • 40. THE MACHINE LEARNING COMPANY ® Global 2000 Credit Card Network – Before! Transaction Data Transferred From Database to Linux Server Modeling Fraud Model created to detect fraud. Model is exported Real-timedetection Model is re-coded by New set of engineers for main-frame New model is “loaded” fraud could be detected In Real-time. •  Customer wanted a more accurate model •  Current model in system was designed to be updated on a yearly basis •  Running a model on large dataset took over 2 days •  Skytree’s goal is to move update of the model to daily or real time Hardware: Linux x86 Server, Mainframe Software: Internally developed random decision forests SLA: Fraud scored in real-time. Fraud model updated yearly XX XX
  • 41. THE MACHINE LEARNING COMPANY ® Global 2000 Credit Card Network - Now! Modeling&Real-TimeScoreEnvironment •  Customer can use the same environment for modeling and for production •  Models can be updated on a daily or real- time basis depending on needs •  More frequent updates leads to significant increase in lift Hardware: Linux x86 Server Software: MapR, Skytree fraud detection models SLA: Fraud scored in real-time. Fraud model daily / real-time Data Stored on MapR Hadoop Cluster Fraud Model Created Using Fraud Model updated Daily / real-time Data Stored on MapR Hadoop Cluster Unsupervised ML Models Created Using Fraud Model updated Daily / real-time
  • 42. THE MACHINE LEARNING COMPANY ® “Key to increasing fraud detection accuracy”! •  Use all of the data: Sampling can decrease accuracy of results •  Semi-supervised learning: Combination of supervised and unsupervised learning can improve fraud detection rates •  Weight transactions based on date: Skytree server allows each transaction to be weighted differently and allows fraud models to preferentially weigh recent fraud vs older fraud •  Use the most important variables: o  Were the last few transactions at an un-manned location? o  Is the transaction over the credit limit? o  Which day of the week was the fraud committed? o  Has the card been reported for fraud before? o  And more… •  Weight based on transaction value: we should care more about larger transactions Global 2000 Credit Card Network - Now!
  • 43. THE MACHINE LEARNING COMPANY ®19 Skytree Maximizes Predictive Accuracy! 19 Advantages Benefits Greater chance of having the best model for your data Breadth of Advanced Methods: more powerful/advanced methods and options 1 1 Improved accuracy in the time available Speed & Scalability: use more data, test more parameters 2 2 More productive modelers, more people in the company can use it Automation / Ease of Use: shorter time to most accurate models 3 3 Skytree is designed from the ground up for these benefits.
  • 44. THE MACHINE LEARNING COMPANY ® Sources of Generalization Error! 20 Motivations: Sources of Generalization Error Excess Error Improper Model Finite Samples Algorithmic Accuracy E⇠ ⇥ f(xt, ⇠) infx2H⇤ f(x, ⇠) ⇤ E⇠ ⇥XXXXXXX inf x2H f(x, ⇠) inf x2H⇤ f(x, ⇠) ⇤ | {z } ErrApproximation E⇠ ⇥ ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) XXXXXXX inf x2H f(x, ⇠) ⇤ | {z } ErrEstimation E⇠ ⇥ f(xt, ⇠) ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) ⇤ | {z } ErrExpected-Optimization ⇠ : data sample; N : number of data samples; H : hypothesis space of the model; H⇤ : “true” hypothesis space that contains the optimal x⇤ Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8
  • 45. THE MACHINE LEARNING COMPANY ® First Principles: Sources of prediction error! 21 Motivations: Sources of Generalization Error Excess Error Improper Model Finite Samples Algorithmic Accuracy E⇠ ⇥ f(xt, ⇠) infx2H⇤ f(x, ⇠) ⇤ E⇠ ⇥XXXXXXX inf x2H f(x, ⇠) inf x2H⇤ f(x, ⇠) ⇤ | {z } ErrApproximation E⇠ ⇥ ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) XXXXXXX inf x2H f(x, ⇠) ⇤ | {z } ErrEstimation E⇠ ⇥ f(xt, ⇠) ⇠⇠⇠⇠⇠⇠ f(x⇤ (N), ⇠) ⇤ | {z } ErrExpected-Optimization ⇠ : data sample; N : number of data samples; H : hypothesis space of the model; H⇤ : “true” hypothesis space that contains the optimal x⇤ Hua Ouyang Optimal Stochastic & Distributed Algorithms for Machine Learning 8 Use the right model: Try many Use more data: All of it Use the right parameters: Try many
  • 46. THE MACHINE LEARNING COMPANY ® 1.x MAPR Data Platform Spark 2.x/ YARN ZooKeeper Web Services DataSources/Targets OLTP / EDW Command Line Interface Skytree and Spark!
  • 47. THE MACHINE LEARNING COMPANY ® Why Skytree? 
 Why do companies pick us for Big Data analytics?! 23 INVESTORS! (22M+)! Built on Solid Foundation
  • 48. THE MACHINE LEARNING COMPANY ® SAME DATA. BETTER RESULTS. Thank You. www.skytree.net ! 24
  • 49. THE MACHINE LEARNING COMPANY ® Q&AEngage with us! 1.  Download the MapR Sandbox for Hadoop: ! www.mapr.com/sandbox! ! 2. Download machine learning e-books from Ted Dunning:! http://www.mapr.com/resources/white-papers#e-books ! 3. Visit!Skytree at www.skytree.net ! 4. Learn best practices for Hadoop ETL:! !www.mapr.com/EDH! !