SlideShare ist ein Scribd-Unternehmen logo
1 von 25
From square to round wheels...
       ...moving from batch to real-time machine learning


                                        tumra.com
                                         @tumra
TUMRA LTD, Building 3, Chiswick Park,
566 Chiswick High Road, W4 5YA                      Michael Cutler - 6th Sept 2012
Batch
Processing
Credit: http://bit.ly/Q71u4W
In Manufacturing...
Batch processing brought advantages :-
 ● Increased scale of production

 ● Reduced manufacturing cost

 ● Economies of scale (reusable parts)




However :-
● Machinery is complex & expensive

● Each product requires some bespoke parts
In Technology...
Been around since the 50's in Mainframes

Hadoop (Map/Reduce) advantages :-
● Increased scale of processing

● Reduced processing cost **

● Economies of scale (reusable code)




However :-
● Complex & expensive **

● Most jobs requires some bespoke code
Map/Reduce != FUN
Sure its "just Java" but...
 ● Requires certain mindset

 ● Multi-stage algorithm complexity

 ● If you get stuck, R.T.F.S.




Alleviated to an extent by tools like :-
 ● Pig, Hive, Cascading, Crunch




Typically requires bespoke code / algorithms
Continuous
Processing
Credit: http://bit.ly/NOslqf
In manufacturing...
Described as:
  "a method used to manufacture, produce, or
  process materials without interruption"

Key features :-
 ● Materials are processed in flows & streams

 ● Can run continuously (exc. maintenance)

 ● Latency e2e can be from seconds to hours




                                            Credit: Wikipedia
In Technology...
We have a problem... most Hadoop related
technologies are inherently batch!!

The trend towards real-time continuous
computation requires :-
 ● New tools (Storm?)

 ● Better algorithms




So what's the solution?
Credit: Scott Simmerman
     http://bit.ly/9cxaHt
It's a hybrid of both!
Batch does have its place...
Map/Reduce is great for 'boil the ocean' jobs;
● tasks that take hours or days

● typically non-interactive with users

● works well for pattern mining, clustering etc.




However, the 'perfect' answer is useless if it
arrives so late it's irrelevant...
Real-time machine learning
Quite simply "data is never at rest"...
● processed in streams not batches

● best for 'supervised learning' models

● end-to-end latency can be in seconds




Key criteria :-
 ● model always has a 'best answer' available

 ● feedback used to train the model
So what works well in real-time?
Classification :-
 ● Easiest to implement




Clustering :-
 ● Periodically batch recompute clusters

 ● Add new data points to the nearest centroid

 ● Rinse, repeat




Collaborative filtering :-
The machine learning gap...
Academic                      Practical
Machine learning gap...
Academia are 'way out there' with new
approaches and algorithms almost every day :-
 ● Many hard to implement in a parallel way




We need more focus on :-
● Inherently distributed algorithms

● Practical implementations

● Speed over marginal accuracy improvements
Mathematical navel gazing
We need practical solutions to real-world
problems...



Recommendations Rant!?!?!?!?!
 ● Most recommenders are 2D matrices

 ● Humans are not very 2D

 ● Is there an N-dimensional solution?
Hybrid approach
Hybrid approach
Example Use-cases
Examples;
 ● eCommerce optimisation

 ● Targeted advertising

 ● Financial services (risk modeling)

 ● Detecting anomalies in M2M data

 ● Automated metadata generation




... many more!
Almost finished!
Introducing TUMRA Labs
API access to some of our real-time models :-
 ● Probabilistic Demographics

 ● Language detection **

 ● Sentiment analysis **

 ● Metadata Generation (entity extraction and

   disambiguation) **

    Free to signup and easy to get started!

            http://labs.tumra.com/
Questions?
  tumra.com
   @tumra

Weitere ähnliche Inhalte

Andere mochten auch

Sound effect manipulation word 5
Sound effect manipulation word 5Sound effect manipulation word 5
Sound effect manipulation word 5halo4robo
 
La desigual distribución de la población
La desigual distribución de la poblaciónLa desigual distribución de la población
La desigual distribución de la poblaciónAbraham Galindo Manning
 
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues MessegesprächEndlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues MessegesprächMarkus Deixler-Wimmer
 
Leads facade- Design Develope Deliver
Leads facade- Design Develope DeliverLeads facade- Design Develope Deliver
Leads facade- Design Develope DeliverLeads Facade
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureData Science London
 
Photoshoot and photoshop
Photoshoot and photoshopPhotoshoot and photoshop
Photoshoot and photoshopniamhbarrett
 
Word Association Test by ISSB Guideline
Word Association Test by ISSB GuidelineWord Association Test by ISSB Guideline
Word Association Test by ISSB GuidelineISSBGuideline
 
Smart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino UnoSmart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino Unomugia_islami
 

Andere mochten auch (11)

Sound effect manipulation word 5
Sound effect manipulation word 5Sound effect manipulation word 5
Sound effect manipulation word 5
 
Resume 2015
Resume 2015Resume 2015
Resume 2015
 
La desigual distribución de la población
La desigual distribución de la poblaciónLa desigual distribución de la población
La desigual distribución de la población
 
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues MessegesprächEndlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
Endlich wieder Messe - Teil 4: So funktioniert Ihr neues Messegespräch
 
Acme Competition
Acme CompetitionAcme Competition
Acme Competition
 
Leads facade- Design Develope Deliver
Leads facade- Design Develope DeliverLeads facade- Design Develope Deliver
Leads facade- Design Develope Deliver
 
Cyber Crime Investigation
Cyber Crime InvestigationCyber Crime Investigation
Cyber Crime Investigation
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Photoshoot and photoshop
Photoshoot and photoshopPhotoshoot and photoshop
Photoshoot and photoshop
 
Word Association Test by ISSB Guideline
Word Association Test by ISSB GuidelineWord Association Test by ISSB Guideline
Word Association Test by ISSB Guideline
 
Smart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino UnoSmart Hanger Based on Arduino Uno
Smart Hanger Based on Arduino Uno
 

Mehr von Data Science London

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Data Science London
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingData Science London
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Data Science London
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresData Science London
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayData Science London
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignData Science London
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Data Science London
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryData Science London
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutData Science London
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutData Science London
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersData Science London
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxData Science London
 

Mehr von Data Science London (20)

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Nowcasting Business Performance
Nowcasting Business PerformanceNowcasting Business Performance
Nowcasting Business Performance
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
 
Survival Analysis of Web Users
Survival Analysis of Web UsersSurvival Analysis of Web Users
Survival Analysis of Web Users
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems Design
 
Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?Autonomous Discovery: The New Interface?
Autonomous Discovery: The New Interface?
 
Data Science for Live Music
Data Science for Live MusicData Science for Live Music
Data Science for Live Music
 
Research at last.fm
Research at last.fmResearch at last.fm
Research at last.fm
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music Industry
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with Mahout
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Practical Magic with Incanter
Practical Magic with IncanterPractical Magic with Incanter
Practical Magic with Incanter
 
Investigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists ToolboxInvestigative Analytics- What's in a Data Scientists Toolbox
Investigative Analytics- What's in a Data Scientists Toolbox
 

Kürzlich hochgeladen

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

...Moving from batch to real-time machine learning

  • 1. From square to round wheels... ...moving from batch to real-time machine learning tumra.com @tumra TUMRA LTD, Building 3, Chiswick Park, 566 Chiswick High Road, W4 5YA Michael Cutler - 6th Sept 2012
  • 4. In Manufacturing... Batch processing brought advantages :- ● Increased scale of production ● Reduced manufacturing cost ● Economies of scale (reusable parts) However :- ● Machinery is complex & expensive ● Each product requires some bespoke parts
  • 5. In Technology... Been around since the 50's in Mainframes Hadoop (Map/Reduce) advantages :- ● Increased scale of processing ● Reduced processing cost ** ● Economies of scale (reusable code) However :- ● Complex & expensive ** ● Most jobs requires some bespoke code
  • 6. Map/Reduce != FUN Sure its "just Java" but... ● Requires certain mindset ● Multi-stage algorithm complexity ● If you get stuck, R.T.F.S. Alleviated to an extent by tools like :- ● Pig, Hive, Cascading, Crunch Typically requires bespoke code / algorithms
  • 9. In manufacturing... Described as: "a method used to manufacture, produce, or process materials without interruption" Key features :- ● Materials are processed in flows & streams ● Can run continuously (exc. maintenance) ● Latency e2e can be from seconds to hours Credit: Wikipedia
  • 10. In Technology... We have a problem... most Hadoop related technologies are inherently batch!! The trend towards real-time continuous computation requires :- ● New tools (Storm?) ● Better algorithms So what's the solution?
  • 11. Credit: Scott Simmerman http://bit.ly/9cxaHt
  • 12. It's a hybrid of both!
  • 13. Batch does have its place... Map/Reduce is great for 'boil the ocean' jobs; ● tasks that take hours or days ● typically non-interactive with users ● works well for pattern mining, clustering etc. However, the 'perfect' answer is useless if it arrives so late it's irrelevant...
  • 14. Real-time machine learning Quite simply "data is never at rest"... ● processed in streams not batches ● best for 'supervised learning' models ● end-to-end latency can be in seconds Key criteria :- ● model always has a 'best answer' available ● feedback used to train the model
  • 15.
  • 16. So what works well in real-time? Classification :- ● Easiest to implement Clustering :- ● Periodically batch recompute clusters ● Add new data points to the nearest centroid ● Rinse, repeat Collaborative filtering :-
  • 17. The machine learning gap... Academic Practical
  • 18. Machine learning gap... Academia are 'way out there' with new approaches and algorithms almost every day :- ● Many hard to implement in a parallel way We need more focus on :- ● Inherently distributed algorithms ● Practical implementations ● Speed over marginal accuracy improvements
  • 19. Mathematical navel gazing We need practical solutions to real-world problems... Recommendations Rant!?!?!?!?! ● Most recommenders are 2D matrices ● Humans are not very 2D ● Is there an N-dimensional solution?
  • 22. Example Use-cases Examples; ● eCommerce optimisation ● Targeted advertising ● Financial services (risk modeling) ● Detecting anomalies in M2M data ● Automated metadata generation ... many more!
  • 24. Introducing TUMRA Labs API access to some of our real-time models :- ● Probabilistic Demographics ● Language detection ** ● Sentiment analysis ** ● Metadata Generation (entity extraction and disambiguation) ** Free to signup and easy to get started! http://labs.tumra.com/