SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Real-time Learning

©MapR Technologies - Confidential       1
whoami – Ted Dunning

     Chief Application Architect, MapR Technologies
     Committer, member, Apache Software Foundation
       –   particularly Mahout, Zookeeper and Drill


                       (we’re hiring)

     Contact me at
       tdunning@maprtech.com
       tdunning@apache.com
       ted.dunning@gmail.com
       @ted_dunning



©MapR Technologies - Confidential            2
     Slides and such (available late tonight):
       –   http://www.mapr.com/company/events/devoxx-3-29-2013
     Hash tags: #mapr #devoxxfr




©MapR Technologies - Confidential        3
Agenda

     What is real-time learning?
     A sample problem
     Philosophy, statistics and the nature of the knowledge
     A solution
     System design




©MapR Technologies - Confidential     4
What is Real-time Learning?

     Training data arrives one record at a time


     The system improves a mathematical model based on a small
      amount of training data


     We retain at most a fixed amount of state


     Each learning step takes O(1) time and memory




©MapR Technologies - Confidential      5
We have a product
                             to sell …
                                    from a web-site


©MapR Technologies - Confidential        6
What tag-
                               What                                 line?
                              picture?
                                                  Bogus Dog Food is the Best!
                                                  Now available in handy 1 ton
                                                  bags!



                                         Buy 5!




                                                    What call to
                                                     action?




©MapR Technologies - Confidential                           7
The Challenge

     Design decisions affect probability of success
       –   Cheesy web-sites don’t even sell cheese



     The best designers do better when allowed to fail
       –   Exploration juices creativity




     But failing is expensive
       –   If only because we could have succeeded
       –   But also because offending or disappointing customers is bad


©MapR Technologies - Confidential            8
A Quick Diversion

     You see a coin
       –   What is the probability of heads?
       –   Could it be larger or smaller than that?
     I flip the coin and while it is in the air ask again
     I catch the coin and ask again
     I look at the coin (and you don’t) and ask again
     Why does the answer change?
       –   And did it ever have a single value?




©MapR Technologies - Confidential                 9
A Philosophical Conclusion

     Probability as expressed by humans is subjective and depends on
      information and experience




©MapR Technologies - Confidential    10
So now you understand
                   Bayesian probability



©MapR Technologies - Confidential   11
Another Quick Diversion

     Let’s play a shell game
     This is a special shell game
     It costs you nothing to play
     The pea has constant probability of being under each shell
           (trust me)
     How do you find the best shell?
     How do you find it while maximizing the number of wins?




©MapR Technologies - Confidential       12
Pause for short
                                    con-game



©MapR Technologies - Confidential          13
Conclusions

     Can you identify winners or losers without trying them out?
       No


     Can you ever completely eliminate a shell with a bad streak?
       No


     Should you keep trying apparent losers?
       Yes, but at a decreasing rate




©MapR Technologies - Confidential      14
So now you understand
                   multi-armed bandits



©MapR Technologies - Confidential   15
Is there an optimum
                   strategy?



©MapR Technologies - Confidential   16
Thompson Sampling

     Select each shell according to the probability that it is the best


     Probability that it is the best can be computed using posterior

                   é                           ù
P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq
                   ë              j            û
     But I promised a simple answer




©MapR Technologies - Confidential      17
Thompson Sampling – Take 2

     Sample θ

                   q ~ P(q | D)
     Pick i to maximize reward

                  i = argmax E[r | q ]
                                    j

     Record result from using i




©MapR Technologies - Confidential        18
Nearly Forgotten until Recently

     Citations for Thompson sampling




©MapR Technologies - Confidential   19
Bayesian Bandit for the Shells

     Compute distributions based on data so far
     Sample p1, p2 and p3 from these distributions
     Pick shell i where i = argmaxi pi


     Lemma 1: The probability of picking shell i will match the
      probability it is the best shell


     Lemma 2: This is as good as it gets




©MapR Technologies - Confidential         20
And it works!


                                    0.12


                                    0.11


                                     0.1


                                    0.09


                                    0.08


                                    0.07
                           regret




                                    0.06
                                                                 ε- greedy, ε = 0.05
                                    0.05


                                    0.04                                               Bayesian Bandit with Gam m a- Norm al
                                    0.03


                                    0.02


                                    0.01


                                      0
                                           0   100   200   300       400    500        600    700    800    900    1000   1100

                                                                                   n




©MapR Technologies - Confidential                                                 21
Video Demo




©MapR Technologies - Confidential       22
The Basic Idea

     We can encode a distribution by sampling
     Sampling allows unification of exploration and exploitation


     Can be extended to more general response models




©MapR Technologies - Confidential     23
The Original Problem

                                                                      x2
                                    x1

                                                  Bogus Dog Food is the Best!
                                                  Now available in handy 1 ton
                                                  bags!



                                         Buy 5!




                                                        x3




©MapR Technologies - Confidential                            24
Mathematical Statement

     Logistic or probit regression

                                    P(conversion) = w   (å x q )
                                                            i ij

                                                     1
                                            w(x) =
                                                   1+ e- x
                                                   erf(x) +1
                                            w(x) =
                                                       2



©MapR Technologies - Confidential                  25
Same Algorithm

     Sample θ

                   q ~ P(q | D)
     Pick design x to maximize reward


                 x* = argmax E[rx | q ] = argmax å xiqij
                                    x        x




©MapR Technologies - Confidential       26
Context Variables

                                                                          x2
                                    x1

                                                      Bogus Dog Food is the Best!
                                                      Now available in handy 1 ton
                                                      bags!



                                             Buy 5!




                                                            x3


           y1=user.geo                   y2=env.time       y3=env.day_of_week        y4=env.weekend


©MapR Technologies - Confidential                                27
Two Kinds of Variables

     The web-site design - x1, x2, x3
       –   We can change these
       –   Different values give different web-site designs


     The environment or context – y1, y2, y3, y4
       –   We can’t change these
       –   They can change themselves


     Our model should include interactions between x and y




©MapR Technologies - Confidential             28
Same Algorithm, More Greek Letters

     Sample θ, π, φ

           (q, P, F)~ P(q, P, F | D)
     Pick design x to maximize reward, y’s are constant

               x* = argmax E[rx | q ]
                                    x

                           = argmax å xiqi + å xi y j p ij + å yij i
                                    x   i       i, j         i

     This looks very fancy, but is actually pretty simple


©MapR Technologies - Confidential               29
Surprises

     We cannot record a non-conversion until we wait


     We cannot record a conversion until we wait for the same time


     Learning from conversions requires delay


     We don’t have to wait very long




©MapR Technologies - Confidential       30
©MapR Technologies - Confidential   31
©MapR Technologies - Confidential   32
©MapR Technologies - Confidential   33
©MapR Technologies - Confidential   34
Required Steps

     Learn distribution of parameters from data
       –   Logistic regression or probit regression (can be on-line!)
       –   Need Bayesian learning algorithm


     Sample from posterior distribution
       –   Generally included in Bayesian learning algorithm


     Pick design
       –   Simple sequential search


     Record data


©MapR Technologies - Confidential              35
Required system
                                        design



©MapR Technologies - Confidential          36
Hadoop is Not Very Real-time

                                            Unprocessed       now
                                               Data

                                    t


                                          Fully Latest full   Hadoop job
                                        processed period      takes this
                                                              long for this
                                                              data

©MapR Technologies - Confidential              37
Real-time and Long-time together

                                                Blended       now
                                                  View
                                                  view

                                    t

                                         Hadoop works     Storm
                                        great back here   works
                                                           here



©MapR Technologies - Confidential                38
Traditional Hadoop Design

     Can use Kafka cluster to queue log lines
     Can use Storm cluster to do real time learning
     Can host web site on NAS
     Can use Flume cluster to import data from Kafka to Hadoop
     Can record long-term history on Hadoop Cluster


     How many clusters?




©MapR Technologies - Confidential     39
HDFS
                                                     Data


                                        Flume
                                    Hadoop

          Users
                                                  Kafka
                                                    Kafka
                                                     Kafka
                                                 Cluster
                                                   Cluster           Kafka
                                                    Cluster           API
                                                                             Storm
                                             Kafka
                     Web Site


                                                                     Design
                                                                    Targeting

                                                                 Web Service NAS
©MapR Technologies - Confidential                           40
That is a lot of
                                 moving parts!



©MapR Technologies - Confidential       41
Alternative Design

     Can host log catcher on MapR via NFS
     Storm can read data directly from queue
     Can host web server directly on cluster


     Only one cluster needed
       –   Total instances drops by 3x
       –   Admin burden massively decreased




©MapR Technologies - Confidential         42
Users




                                                                            http



                                                                          Web-server
                                      Catcher              Storm




                                           Topic                   Web
                                           Queue                   Data
                               MapR




©MapR Technologies - Confidential                    43
You can do this
                                       yourself!



©MapR Technologies - Confidential          44
Contact Me!

     We’re hiring at MapR in US and Europe

     MapR software available for research use

     Contact me at tdunning@maprtech.com or @ted_dunning

     Share news with @apachemahout


     Tweet #devoxxfr #mapr #mahout @ted_dunning




©MapR Technologies - Confidential    45

Weitere ähnliche Inhalte

Ähnlich wie Devoxx Real-time Learning

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time TogetherMapR Technologies
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clusteringTed Dunning
 
Machine Learning - What, Where and How
Machine Learning - What, Where and HowMachine Learning - What, Where and How
Machine Learning - What, Where and Hownarinderk
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for MahoutTed Dunning
 
London data science
London data scienceLondon data science
London data scienceTed Dunning
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionDataWorks Summit
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesOlivier Teytaud
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systemsOlivier Teytaud
 

Ähnlich wie Devoxx Real-time Learning (15)

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Machine Learning - What, Where and How
Machine Learning - What, Where and HowMachine Learning - What, Where and How
Machine Learning - What, Where and How
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 
London data science
London data scienceLondon data science
London data science
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 

Mehr von Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 

Mehr von Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Devoxx Real-time Learning

  • 2. whoami – Ted Dunning  Chief Application Architect, MapR Technologies  Committer, member, Apache Software Foundation – particularly Mahout, Zookeeper and Drill (we’re hiring)  Contact me at tdunning@maprtech.com tdunning@apache.com ted.dunning@gmail.com @ted_dunning ©MapR Technologies - Confidential 2
  • 3. Slides and such (available late tonight): – http://www.mapr.com/company/events/devoxx-3-29-2013  Hash tags: #mapr #devoxxfr ©MapR Technologies - Confidential 3
  • 4. Agenda  What is real-time learning?  A sample problem  Philosophy, statistics and the nature of the knowledge  A solution  System design ©MapR Technologies - Confidential 4
  • 5. What is Real-time Learning?  Training data arrives one record at a time  The system improves a mathematical model based on a small amount of training data  We retain at most a fixed amount of state  Each learning step takes O(1) time and memory ©MapR Technologies - Confidential 5
  • 6. We have a product to sell … from a web-site ©MapR Technologies - Confidential 6
  • 7. What tag- What line? picture? Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! What call to action? ©MapR Technologies - Confidential 7
  • 8. The Challenge  Design decisions affect probability of success – Cheesy web-sites don’t even sell cheese  The best designers do better when allowed to fail – Exploration juices creativity  But failing is expensive – If only because we could have succeeded – But also because offending or disappointing customers is bad ©MapR Technologies - Confidential 8
  • 9. A Quick Diversion  You see a coin – What is the probability of heads? – Could it be larger or smaller than that?  I flip the coin and while it is in the air ask again  I catch the coin and ask again  I look at the coin (and you don’t) and ask again  Why does the answer change? – And did it ever have a single value? ©MapR Technologies - Confidential 9
  • 10. A Philosophical Conclusion  Probability as expressed by humans is subjective and depends on information and experience ©MapR Technologies - Confidential 10
  • 11. So now you understand Bayesian probability ©MapR Technologies - Confidential 11
  • 12. Another Quick Diversion  Let’s play a shell game  This is a special shell game  It costs you nothing to play  The pea has constant probability of being under each shell (trust me)  How do you find the best shell?  How do you find it while maximizing the number of wins? ©MapR Technologies - Confidential 12
  • 13. Pause for short con-game ©MapR Technologies - Confidential 13
  • 14. Conclusions  Can you identify winners or losers without trying them out? No  Can you ever completely eliminate a shell with a bad streak? No  Should you keep trying apparent losers? Yes, but at a decreasing rate ©MapR Technologies - Confidential 14
  • 15. So now you understand multi-armed bandits ©MapR Technologies - Confidential 15
  • 16. Is there an optimum strategy? ©MapR Technologies - Confidential 16
  • 17. Thompson Sampling  Select each shell according to the probability that it is the best  Probability that it is the best can be computed using posterior é ù P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq ë j û  But I promised a simple answer ©MapR Technologies - Confidential 17
  • 18. Thompson Sampling – Take 2  Sample θ q ~ P(q | D)  Pick i to maximize reward i = argmax E[r | q ] j  Record result from using i ©MapR Technologies - Confidential 18
  • 19. Nearly Forgotten until Recently  Citations for Thompson sampling ©MapR Technologies - Confidential 19
  • 20. Bayesian Bandit for the Shells  Compute distributions based on data so far  Sample p1, p2 and p3 from these distributions  Pick shell i where i = argmaxi pi  Lemma 1: The probability of picking shell i will match the probability it is the best shell  Lemma 2: This is as good as it gets ©MapR Technologies - Confidential 20
  • 21. And it works! 0.12 0.11 0.1 0.09 0.08 0.07 regret 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 800 900 1000 1100 n ©MapR Technologies - Confidential 21
  • 22. Video Demo ©MapR Technologies - Confidential 22
  • 23. The Basic Idea  We can encode a distribution by sampling  Sampling allows unification of exploration and exploitation  Can be extended to more general response models ©MapR Technologies - Confidential 23
  • 24. The Original Problem x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 ©MapR Technologies - Confidential 24
  • 25. Mathematical Statement  Logistic or probit regression P(conversion) = w (å x q ) i ij 1 w(x) = 1+ e- x erf(x) +1 w(x) = 2 ©MapR Technologies - Confidential 25
  • 26. Same Algorithm  Sample θ q ~ P(q | D)  Pick design x to maximize reward x* = argmax E[rx | q ] = argmax å xiqij x x ©MapR Technologies - Confidential 26
  • 27. Context Variables x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 y1=user.geo y2=env.time y3=env.day_of_week y4=env.weekend ©MapR Technologies - Confidential 27
  • 28. Two Kinds of Variables  The web-site design - x1, x2, x3 – We can change these – Different values give different web-site designs  The environment or context – y1, y2, y3, y4 – We can’t change these – They can change themselves  Our model should include interactions between x and y ©MapR Technologies - Confidential 28
  • 29. Same Algorithm, More Greek Letters  Sample θ, π, φ (q, P, F)~ P(q, P, F | D)  Pick design x to maximize reward, y’s are constant x* = argmax E[rx | q ] x = argmax å xiqi + å xi y j p ij + å yij i x i i, j i  This looks very fancy, but is actually pretty simple ©MapR Technologies - Confidential 29
  • 30. Surprises  We cannot record a non-conversion until we wait  We cannot record a conversion until we wait for the same time  Learning from conversions requires delay  We don’t have to wait very long ©MapR Technologies - Confidential 30
  • 31. ©MapR Technologies - Confidential 31
  • 32. ©MapR Technologies - Confidential 32
  • 33. ©MapR Technologies - Confidential 33
  • 34. ©MapR Technologies - Confidential 34
  • 35. Required Steps  Learn distribution of parameters from data – Logistic regression or probit regression (can be on-line!) – Need Bayesian learning algorithm  Sample from posterior distribution – Generally included in Bayesian learning algorithm  Pick design – Simple sequential search  Record data ©MapR Technologies - Confidential 35
  • 36. Required system design ©MapR Technologies - Confidential 36
  • 37. Hadoop is Not Very Real-time Unprocessed now Data t Fully Latest full Hadoop job processed period takes this long for this data ©MapR Technologies - Confidential 37
  • 38. Real-time and Long-time together Blended now View view t Hadoop works Storm great back here works here ©MapR Technologies - Confidential 38
  • 39. Traditional Hadoop Design  Can use Kafka cluster to queue log lines  Can use Storm cluster to do real time learning  Can host web site on NAS  Can use Flume cluster to import data from Kafka to Hadoop  Can record long-term history on Hadoop Cluster  How many clusters? ©MapR Technologies - Confidential 39
  • 40. HDFS Data Flume Hadoop Users Kafka Kafka Kafka Cluster Cluster Kafka Cluster API Storm Kafka Web Site Design Targeting Web Service NAS ©MapR Technologies - Confidential 40
  • 41. That is a lot of moving parts! ©MapR Technologies - Confidential 41
  • 42. Alternative Design  Can host log catcher on MapR via NFS  Storm can read data directly from queue  Can host web server directly on cluster  Only one cluster needed – Total instances drops by 3x – Admin burden massively decreased ©MapR Technologies - Confidential 42
  • 43. Users http Web-server Catcher Storm Topic Web Queue Data MapR ©MapR Technologies - Confidential 43
  • 44. You can do this yourself! ©MapR Technologies - Confidential 44
  • 45. Contact Me!  We’re hiring at MapR in US and Europe  MapR software available for research use  Contact me at tdunning@maprtech.com or @ted_dunning  Share news with @apachemahout  Tweet #devoxxfr #mapr #mahout @ted_dunning ©MapR Technologies - Confidential 45