SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Real-Time Machine Learning at
             Industrial scale
                          ... the battle of accuracy vs latency

                                        tumra.com
                                         @tumra
                                                                9th October 2012
TUMRA LTD, Building 3, Chiswick Park,
566 Chiswick High Road, W4 5YA                              Michael Cutler @cotdp
$ whoami
Michael Cutler (@cotdp)
●   Previously at British Sky Broadcasting
    ○   Last 7 years in R&D
    ○   Created several patented systems & algorithms
    ○   Kicked off ‘Big Data’ initiative at Sky in 2008

●   Co-founder CTO @ TUMRA in March '12
    ○   Real-time big data science platform
    ○   Alpha-testing with selected clients
Agenda
●   Background
●   Real-Time vs Batch processing
●   Accuracy vs Latency
●   Use Cases
     ○ eCommerce

     ○ Financial Services

     ○ Media

●   Questions
Background
Big Data is "in vogue", but what does it mean:
 ● Distributed processing

 ● Massively scalable

 ● Commodity



Apache Hadoop is "Kernel" of Big Data OS:
 ● Distributed Filesystem (HDFS)

 ● Parallel Processing (Map/Reduce, YARN)
Background (cont'd)
Solving problems with Big Data is hard:
 ● Tools are all low-level (Pig, Hive etc.)

 ● Skills are hard to find



What is "Data Science":
● Understanding data & solving problems

● Applies the following skills:

   ○   Statistical Analysis
   ○   Machine Learning
   ○   Communicating Results
Real-Time vs
Batch processing
Batch - Hoppers, Bins, Buckets




 Credit: http://bit.ly/Q71u4W
Real-Time - Flows & Streams




                          Credit: http://bit.ly/NOslqf
Real-Time vs Batch processing
Similarities to the Industrial Revolution:
 ● From handicraft to Batch & Real-Time

 ● Complexity increases



Need for "Real-Time":
● Wherever the variation can change faster

  than you can retrain models
● When you can't pre-compute everything

  ahead of time
Accuracy vs Latency
Accuracy vs Latency
Netflix Prize winning entry :-
● Ensemble of 100's of models

● Massively compute intensive solution

● Marginally better than much simpler models




IBM won the KDD Cup 2009 (Orange) :-
 ● IBM Watson team won by sheer brute force

 ● Used a "one of everything" approach

   generating hundreds of models
Accuracy vs Latency (cont'd)
Mathematical navel-gazing:
● Often the factor we're optimising for, isn't

  the thing we measure improvement in:
   ○ User ratings vs. customer longevity/value

   ○ Overfitting outliers vs. missing clear Fraud




Given the choice between a "best guess" now,
and a "marginally better" answer later, I'd take
the "best guess" every time.
However, that doesn't mean...
Accuracy vs Latency (cont'd)
It's a trade-off:
 ● Sometimes "best guess" is good enough,

 ● Other times we can wait for the accuracy,

 ● And of course, occasionally we want both!



Key objective:
 ● Most appropriate solution for the use-case

 ● Hybrid solutions part batch, part real-time
Use Case
eCommerce
Use Case - eCommerce
Objective - Increase profits
How:
●   Match potential customers to the right products
●   Personalise user experience on web & email
●   Customer lifecycle management

Method:
●   Ensemble of real-time models
●   Collect lots of implicit feedback data
Use Case - eCommerce (cont'd)
Detail:
●   Clustering - behavior, demogs
●   Simple predictors - keywords to products
●   Bayesian Bandit - blend the output

Requirements:
●   Predictions in < 50 ms
●   Online learning models
●   Occasional batch updates are OK
When eCommerce
    #FAILs
I've only ever bought Cat food...
... wait there's more, no Cat food
Even Amazon can #FAIL
Use Case
Financial Services
Use Case - Financial Services
Objective - Reduce Fraud
How:
●   Compute patterns/predictors for individuals
●   Cluster individuals and recompute for clusters
●   Compute baselines across all data

Method:
●   Hybrid and Hierarchical Clustering models
●   Simple predictors for individuals, clusters & baseline
Use Case - Financial Services
Detail:
●   CHEAT!!! ... Cluster to nearest centroid
     ○ will degrade over time (Hunchback Clusters)

●   Use simple metrics to alert (stddev)

Requirements:
●   Ability to alert/intervene near real-time < 1 second
●   Adapt to rapid changes (within baseline & clusters)
●   Periodic batch processing to recompute clusters
Use Case - Financial Services
Use Case
 Media
Use Case - Media
Objective - Generating Metadata
Why:
●   Drive second screen applications
●   Create new streams of information for resale

How:
●   Video / Audio analysis
●   Closed Caption or, Subtitle text processing
●   Knowledgebase :- People, Places, Products & Things
Use Case - Media (cont'd)
Method:
●   Natural Language Processing
    ○   Named Entity Recognition
    ○   Topic Extraction & Disambiguation
●   Graph databases & algorithms

Requirements:
●   Responses in < 1 second
●   Ability to learn new 'Things'
Example of 12,000 entities from our Knowledgebase...
Summary
Summary
Key points:
●   Clear move towards distributed algorithms
●   Latency is often more favorable than accuracy
●   Trade-offs are dependant on the use-cases

Further reading:
●   Apache Mahout - http://mahout.apache.org/
●   Storm Project - http://storm-project.net/
●   Data Science London - http://datasciencelondon.org/
●   Machine Learning Meetup - http://bit.ly/w8V8f6
Almost finished!
Introducing TUMRA Labs
API access to some of our real-time models:
●   Probabilistic Demographics

Coming Soon:
●   Language detection
●   Sentiment analysis
●   Metadata Generation


      Free to signup and easy to get started!
              http://labs.tumra.com/
Questions?
  Work          Personal
tumra.com      cotdp.com
 @tumra         @cotdp

Weitere ähnliche Inhalte

Was ist angesagt?

Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital BusinessSrinath Perera
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Sabri Skhiri
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Dr. Mohan K. Bavirisetty
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork balvis_ms
 
FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyDatabricks
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland
 
Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Chun Myung Kyu
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectEugene Mandel
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Meetup7 integration microservices_machine_learning
Meetup7 integration microservices_machine_learningMeetup7 integration microservices_machine_learning
Meetup7 integration microservices_machine_learningMegatris Comp
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap Dr. Mohan K. Bavirisetty
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYCSri Ambati
 
SPSS Modeler 16 What's New!?
SPSS Modeler 16 What's New!?SPSS Modeler 16 What's New!?
SPSS Modeler 16 What's New!?Chris Sparshott
 

Was ist angesagt? (20)

Role of Analytics in Digital Business
Role of Analytics in Digital BusinessRole of Analytics in Digital Business
Role of Analytics in Digital Business
 
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
Lambda Architecture 2.0 Convergence between Real-Time Analytics, Context-awar...
 
Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0 Polyglot Processing - An Introduction 1.0
Polyglot Processing - An Introduction 1.0
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork
 
FrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and CheaplyFrugalML: Using ML APIs More Accurately and Cheaply
FrugalML: Using ML APIs More Accurately and Cheaply
 
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
 
Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071
 
The Other 99% of a Data Science Project
The Other 99% of a Data Science ProjectThe Other 99% of a Data Science Project
The Other 99% of a Data Science Project
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Meetup7 integration microservices_machine_learning
Meetup7 integration microservices_machine_learningMeetup7 integration microservices_machine_learning
Meetup7 integration microservices_machine_learning
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Data scientist enablement dse 400 week 8 roadmap
Data scientist enablement   dse 400   week 8 roadmap Data scientist enablement   dse 400   week 8 roadmap
Data scientist enablement dse 400 week 8 roadmap
 
Dive into H2O: NYC
Dive into H2O: NYCDive into H2O: NYC
Dive into H2O: NYC
 
SPSS Modeler 16 What's New!?
SPSS Modeler 16 What's New!?SPSS Modeler 16 What's New!?
SPSS Modeler 16 What's New!?
 

Andere mochten auch

Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersData Science London
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseFujio Turner
 
Fast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approachFast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approachYury Leonychev
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionParis Carbone
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applicationsAnish Das
 
IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...Maged N. Kamel Boulos
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processingData Science Thailand
 
IBM Watson in Healthcare
IBM Watson in HealthcareIBM Watson in Healthcare
IBM Watson in HealthcareAnders Quitzau
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Edureka!
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big DataMax Lin
 
Ibm's watson
Ibm's watsonIbm's watson
Ibm's watsonHdavey01
 
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!Tony Pearson
 

Andere mochten auch (20)

Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Big Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + CouchbaseBig Data - Fast Machine Learning at Scale + Couchbase
Big Data - Fast Machine Learning at Scale + Couchbase
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 
Using Graph theory to understand Intent & Concepts - Neo4j User Group (Januar...
Using Graph theory to understand Intent & Concepts - Neo4j User Group (Januar...Using Graph theory to understand Intent & Concepts - Neo4j User Group (Januar...
Using Graph theory to understand Intent & Concepts - Neo4j User Group (Januar...
 
Fast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approachFast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approach
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Watson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcareWatson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcare
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...IBM Watson Health: How cognitive technologies have begun transforming clinica...
IBM Watson Health: How cognitive technologies have begun transforming clinica...
 
How To Make an Executive Presentation 2011
How To Make an Executive Presentation 2011How To Make an Executive Presentation 2011
How To Make an Executive Presentation 2011
 
IBM WATSON
IBM WATSONIBM WATSON
IBM WATSON
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
IBM Watson in Healthcare
IBM Watson in HealthcareIBM Watson in Healthcare
IBM Watson in Healthcare
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples Application of Clustering in Data Science using Real-life Examples
Application of Clustering in Data Science using Real-life Examples
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big Data
 
Ibm's watson
Ibm's watsonIbm's watson
Ibm's watson
 
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
IBM Watson: How it Works, and What it means for Society beyond winning Jeopardy!
 

Ähnlich wie Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)

Digital and data journey demystified: how it all works
Digital and data journey demystified: how it all worksDigital and data journey demystified: how it all works
Digital and data journey demystified: how it all worksMichal Hodinka
 
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...DataScienceConferenc1
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
How to Use Deep Learning by Mu Sigma Product Manager
How to Use Deep Learning by Mu Sigma Product ManagerHow to Use Deep Learning by Mu Sigma Product Manager
How to Use Deep Learning by Mu Sigma Product ManagerProduct School
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning India Quotient
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupOlivier Koch
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Deploying AI Applications in Enterprises
Deploying AI Applications in EnterprisesDeploying AI Applications in Enterprises
Deploying AI Applications in EnterprisesAnandSRao1962
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Sri Ambati
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with AnalyticsWSO2
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Peter Schleinitz
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Albert Y. C. Chen
 
Transformacion del Negocio Financiero por medio de Tecnologias Cloud
Transformacion del Negocio Financiero por medio de Tecnologias CloudTransformacion del Negocio Financiero por medio de Tecnologias Cloud
Transformacion del Negocio Financiero por medio de Tecnologias CloudRaul Goycoolea Seoane
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...Dario Mangano
 
Witekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenanceWitekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenanceWitekio
 

Ähnlich wie Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012) (20)

Digital and data journey demystified: how it all works
Digital and data journey demystified: how it all worksDigital and data journey demystified: how it all works
Digital and data journey demystified: how it all works
 
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
How to Use Deep Learning by Mu Sigma Product Manager
How to Use Deep Learning by Mu Sigma Product ManagerHow to Use Deep Learning by Mu Sigma Product Manager
How to Use Deep Learning by Mu Sigma Product Manager
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Deploying AI Applications in Enterprises
Deploying AI Applications in EnterprisesDeploying AI Applications in Enterprises
Deploying AI Applications in Enterprises
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
[WSO2Con EU 2017] Deriving Insights for Your Digital Business with Analytics
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
Transformacion del Negocio Financiero por medio de Tecnologias Cloud
Transformacion del Negocio Financiero por medio de Tecnologias CloudTransformacion del Negocio Financiero por medio de Tecnologias Cloud
Transformacion del Negocio Financiero por medio de Tecnologias Cloud
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
Exploring the Cloud
Exploring the CloudExploring the Cloud
Exploring the Cloud
 
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...SDD2017 - 03 Abed Ajraou  - putting data science in your business a first uti...
SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...
 
Witekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenanceWitekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenance
 

Kürzlich hochgeladen

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)

  • 1. Real-Time Machine Learning at Industrial scale ... the battle of accuracy vs latency tumra.com @tumra 9th October 2012 TUMRA LTD, Building 3, Chiswick Park, 566 Chiswick High Road, W4 5YA Michael Cutler @cotdp
  • 2. $ whoami Michael Cutler (@cotdp) ● Previously at British Sky Broadcasting ○ Last 7 years in R&D ○ Created several patented systems & algorithms ○ Kicked off ‘Big Data’ initiative at Sky in 2008 ● Co-founder CTO @ TUMRA in March '12 ○ Real-time big data science platform ○ Alpha-testing with selected clients
  • 3. Agenda ● Background ● Real-Time vs Batch processing ● Accuracy vs Latency ● Use Cases ○ eCommerce ○ Financial Services ○ Media ● Questions
  • 4. Background Big Data is "in vogue", but what does it mean: ● Distributed processing ● Massively scalable ● Commodity Apache Hadoop is "Kernel" of Big Data OS: ● Distributed Filesystem (HDFS) ● Parallel Processing (Map/Reduce, YARN)
  • 5. Background (cont'd) Solving problems with Big Data is hard: ● Tools are all low-level (Pig, Hive etc.) ● Skills are hard to find What is "Data Science": ● Understanding data & solving problems ● Applies the following skills: ○ Statistical Analysis ○ Machine Learning ○ Communicating Results
  • 7. Batch - Hoppers, Bins, Buckets Credit: http://bit.ly/Q71u4W
  • 8. Real-Time - Flows & Streams Credit: http://bit.ly/NOslqf
  • 9. Real-Time vs Batch processing Similarities to the Industrial Revolution: ● From handicraft to Batch & Real-Time ● Complexity increases Need for "Real-Time": ● Wherever the variation can change faster than you can retrain models ● When you can't pre-compute everything ahead of time
  • 11. Accuracy vs Latency Netflix Prize winning entry :- ● Ensemble of 100's of models ● Massively compute intensive solution ● Marginally better than much simpler models IBM won the KDD Cup 2009 (Orange) :- ● IBM Watson team won by sheer brute force ● Used a "one of everything" approach generating hundreds of models
  • 12. Accuracy vs Latency (cont'd) Mathematical navel-gazing: ● Often the factor we're optimising for, isn't the thing we measure improvement in: ○ User ratings vs. customer longevity/value ○ Overfitting outliers vs. missing clear Fraud Given the choice between a "best guess" now, and a "marginally better" answer later, I'd take the "best guess" every time.
  • 14. Accuracy vs Latency (cont'd) It's a trade-off: ● Sometimes "best guess" is good enough, ● Other times we can wait for the accuracy, ● And of course, occasionally we want both! Key objective: ● Most appropriate solution for the use-case ● Hybrid solutions part batch, part real-time
  • 16. Use Case - eCommerce Objective - Increase profits How: ● Match potential customers to the right products ● Personalise user experience on web & email ● Customer lifecycle management Method: ● Ensemble of real-time models ● Collect lots of implicit feedback data
  • 17. Use Case - eCommerce (cont'd) Detail: ● Clustering - behavior, demogs ● Simple predictors - keywords to products ● Bayesian Bandit - blend the output Requirements: ● Predictions in < 50 ms ● Online learning models ● Occasional batch updates are OK
  • 18.
  • 19. When eCommerce #FAILs
  • 20. I've only ever bought Cat food...
  • 21. ... wait there's more, no Cat food
  • 24. Use Case - Financial Services Objective - Reduce Fraud How: ● Compute patterns/predictors for individuals ● Cluster individuals and recompute for clusters ● Compute baselines across all data Method: ● Hybrid and Hierarchical Clustering models ● Simple predictors for individuals, clusters & baseline
  • 25. Use Case - Financial Services Detail: ● CHEAT!!! ... Cluster to nearest centroid ○ will degrade over time (Hunchback Clusters) ● Use simple metrics to alert (stddev) Requirements: ● Ability to alert/intervene near real-time < 1 second ● Adapt to rapid changes (within baseline & clusters) ● Periodic batch processing to recompute clusters
  • 26. Use Case - Financial Services
  • 28. Use Case - Media Objective - Generating Metadata Why: ● Drive second screen applications ● Create new streams of information for resale How: ● Video / Audio analysis ● Closed Caption or, Subtitle text processing ● Knowledgebase :- People, Places, Products & Things
  • 29. Use Case - Media (cont'd) Method: ● Natural Language Processing ○ Named Entity Recognition ○ Topic Extraction & Disambiguation ● Graph databases & algorithms Requirements: ● Responses in < 1 second ● Ability to learn new 'Things' Example of 12,000 entities from our Knowledgebase...
  • 30.
  • 31.
  • 32.
  • 34. Summary Key points: ● Clear move towards distributed algorithms ● Latency is often more favorable than accuracy ● Trade-offs are dependant on the use-cases Further reading: ● Apache Mahout - http://mahout.apache.org/ ● Storm Project - http://storm-project.net/ ● Data Science London - http://datasciencelondon.org/ ● Machine Learning Meetup - http://bit.ly/w8V8f6
  • 36. Introducing TUMRA Labs API access to some of our real-time models: ● Probabilistic Demographics Coming Soon: ● Language detection ● Sentiment analysis ● Metadata Generation Free to signup and easy to get started! http://labs.tumra.com/
  • 37. Questions? Work Personal tumra.com cotdp.com @tumra @cotdp