3. What is Mahout
• Recommendations (people who x this also x
that)
• Clustering (segment data into groups of)
• Classification (learn decision making from
examples)
• Stuff (LDA, SVD, frequent item-set, math)
4. What is Mahout?
• Recommendations (people who x this also x
that)
• Clustering (segment data into groups of)
• Classification (learn decision making from
examples)
• Stuff (LDA, SVM, frequent item-set, math)
5. Classification in Detail
• Naive Bayes Family
– Hadoop based training
• Decision Forests
– Hadoop based training
• Logistic Regression (aka SGD)
– fast on-line (sequential) training
6. Classification in Detail
• Naive Bayes Family
– Hadoop based training
• Decision Forests
– Hadoop based training
• Logistic Regression (aka SGD)
– fast on-line (sequential) training
7. Classification in Detail
• Naive Bayes Family
– Hadoop based training
• Decision Forests
– Hadoop based training
• Logistic Regression (aka SGD)
– fast on-line (sequential) training
– Now with MORE topping!
9. And Another
From: Thu, Paul 20, 2010 at 10:51 AM
Date: Dr. May Acquah
Dear Sir,
From: George <george@fumble-tech.com>
Re: Proposal for over-invoice Contract Benevolence
Hi Ted, was a pleasure talking to you last night
Based on information gathered from the idea of
at the Hadoop User Group. I liked the India
hospital directory, I am pleased to propose a
going for lunch together. Are you available
confidential business noon? for our mutual
tomorrow (Friday) at deal
benefit. I have in my possession, instruments
(documentation) to transfer the sum of
33,100,000.00 eur thirty-three million one hundred
thousand euros, only) into a foreign company's
bank account for our favor.
...
13. How it Works
• We are given “features”
– Often binary values in a vector
• Algorithm learns weights
– Weighted sum of feature * weight is the key
• Each weight is a single real value
14. A Quick Diversion
• You see a coin
– What is the probability of heads?
– Could it be larger or smaller than that?
• I flip the coin and while it is in the air ask again
• I catch the coin and ask again
• I look at the coin (and you don’t) and ask again
• Why does the answer change?
– And did it ever have a single value?
15. A First Conclusion
• Probability as expressed by humans is
subjective and depends on information and
experience
16. A Second Conclusion
• A single number is a bad way to express
uncertain knowledge
• A distribution of values might be better
23. Which One to Play?
• One may be better than the other
• The better machine pays off at some rate
• Playing the other will pay off at a lesser rate
– Playing the lesser machine has “opportunity cost”
• But how do we know which is which?
– Explore versus Exploit!
25. Bayesian Bandit
• Compute distributions based on data
• Sample p1 and p2 from these distributions
• Put a coin in bandit 1 if p1 > p2
• Else, put the coin in bandit 2
26.
27.
28. The Basic Idea
• We can encode a distribution by sampling
• Sampling allows unification of exploration and
exploitation
• Can be extended to more general response
models
29. Deployment with Storm/MapR
Targeting Online
Engine Model
RPC RPC
Model
Selector RPC
Online
RPC Model
Impression
Logs
Training
Conversion Online
Training
Detector Model
Training
Click Logs
RPC
All state managed transactionally
in MapR file system
Conversion
Dashboard
30. Service Architecture
MapR Pluggable Service Management
Storm
Targeting Online
Engine Model
RPC RPC
Model
Selector RPC
Online
Impression
Logs
Conversion
Detector
RPC
Training
Training
Model
Online
Hadoop
Model
Training
Click Logs
RPC
Conversion
Dashboard
MapR Lockless Storage Services
31. Find Out More
• Me: tdunning@mapr.com
ted.dunning@gmail.com
tdunning@apache.com
• MapR: http://www.mapr.com
• Mahout: http://mahout.apache.org
• Code: https://github.com/tdunning
Hinweis der Redaktion
No information would give a relative expected payoff of -0.25. This graph shows 25, 50 and 75%-ile results for sampled experiments with uniform random probabilities. Convergence to optimum is nearly equal to the optimum sqrt(n). Note the log scale on number of trials
Here is how the system converges in terms of how likely it is to pick the better bandit with probabilities that are only slightly different. After 1000 trials, the system is already giving 75% of the bandwidth to the better option. This graph was produced by averaging several thousand runs with the same probabilities.