More Related Content More from DataWorks Summit (20) Learning Linear Models with Hadoop1. Learning Linear Models
with Hadoop
Ulrich Rückert
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
2. Agenda
What are linear models anyway?
How to learn linear models with Hadoop
Demo
Tips, tricks and caveats
Conclusion
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
3. Predictive Analytics
Test Data
Age Income BuysBook
Target 22 67000 ?
Example Learning Attributes Attribute 39 41000 ?
Task Age Income BuysBook
24 60000 yes
• Ad on booksellerʼs web page 65 80000 no
60 95000 no
• Will a customer buy this book? 35 52000 yes
• Training set: observations on 20
43
45000
75000
yes
yes
Model
previous customers 26 51000 yes
52 47000 no
• Test set: new customers 47 38000 no
25 22000 no
Letʼs learn a linear 33 47000 yes
model! Training Data Age
22
Income
67000
BuysBook
yes
39 41000 no
Prediction
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
4. Linear Models
Expert1 Expert2 BuysBook
24 60 ?
64 80 ?
60 96 ?
Whatʼs in the black box?
• Letʼs pretend all attributes are
expert ratings
• Large positive value means yes
• Small value means no Expert 1 Expert 2 Prediction
• Intermediate value: donʼt know 24
65
60
80
?
?
60 95 ?
Let the experts vote
• Sum over ratings for each row
• Larger than threshold: predict yes Expert1
24
Expert2
60
Prediction
?
• Smaller: predict no 64
60
80
96
?
?
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
5. Linear Models
Expert1 Expert2 BuysBook
24 60 ?
64 80 ?
60 96 ?
Whatʼs in the black box?
• Letʼs pretend all attributes are
expert ratings Threshold
• Large positive value means yes 97
• Small value means no Expert 1 Expert 2 > threshold
• Intermediate value: donʼt know 24
65
+
+
60
80
=
=
84
145
no
yes
60 + 95 = 155 yes
Let the experts vote
• Sum over ratings for each row
• Larger than threshold: predict yes Expert1
24
Expert2
60
Prediction
no
• Smaller: predict no 64
60
80
96
yes
yes
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
6. Linear Models
Expert1 Expert2 BuysBook
24 60 ?
64 80 ?
Assign a weight to each 60 96 ?
expert
• Expert is mostly correct: large Weight 1 Weight 2 Threshold
weight
0.75 0.25 48
• Expert is uninformative: zero
• Expert is consistently wrong: Expert 1 Expert 2 > threshold
negative weight 0.75 • 24 + 0.25 • 60 = 33 no
0.75 • 64 + 0.25 • 80 = 68 yes
0.75 • 60 + 0.25 • 96 = 69 yes
Learning models
• A linear model contains weights
and threshold Expert1 Expert2 Prediction
24 60 no
• Learn by finding weights with 64 80 yes
lowest error on training data 60 96 yes
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
7. Linear Models
Expert1 Expert2 BuysBook
24 60 ?
64 80 ?
Assign a weight to each 60 96 ?
expert
• Expert is mostly correct: large Weight 1 Weight 2 Threshold
weight
0 0.25 18
• Expert is uninformative: zero
• Expert is consistently wrong: Expert 1 Expert 2 > threshold
negative weight 0 • 24 + 0.25 • 60 = 15 no
0 • 64 + 0.25 • 80 = 20 yes
0 • 60 + 0.25 • 96 = 24 yes
Learning models
• A linear model contains weights
and threshold Expert1 Expert2 Prediction
24 60 no
• Learn by finding weights with 64 80 yes
lowest error on training data 60 96 yes
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
8. Linear Models
Expert1 Expert2 BuysBook
24 60 ?
64 80 ?
Assign a weight to each 60 96 ?
expert
• Expert is mostly correct: large Weight 1 Weight 2 Threshold
weight
-0.5 0.25 -8
• Expert is uninformative: zero
• Expert is consistently wrong: Expert 1 Expert 2 > threshold
negative weight -0.5 • 24 + 0.25 • 60 = 3 yes
-0.5 • 64 + 0.25 • 80 = -12 no
-0.5 • 60 + 0.25 • 96 = -6 yes
Learning models
• A linear model contains weights
and threshold Expert1 Expert2 Prediction
24 60 yes
• Learn by finding weights with 64 80 no
lowest error on training data 60 96 yes
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
9. Learning Linear Models
Stochastic Gradient Start with default weights
Decent (SGD)
• Main idea: start with default
weights Read next training row
• For each row check if current
weights predict correctly
• If misclassification: adjust weights Do weights predict the
correct label?
Yes
How to adjust weights?
No
• if positive class: add row
Adjust weights
• if negative class: subtract row
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
10. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
1 -1 0
Age Income > threshold
1•? + -1 • ? = ? ?
Age Income BuysBook
24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
11. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
1 -1 0
Age Income > threshold
1 • 24 + -1 • 60 = -36 -1
Age Income BuysBook
24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
12. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
25 59 0
Age Income > threshold
25 • 24 + 59 • 60 = 4140 +1
Age Income BuysBook
24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
13. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
25 59 -1
Age Income > threshold
25 • 24 + 59 • 60 = 4140 +1
Age Income BuysBook
24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
14. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
25 59 -1
Age Income > threshold
25 • ? + 59 • ? = ? ?
Age Income BuysBook
30 30 -1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
15. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
25 59 -1
Age Income > threshold
25 • 30 + 59 • 30 = 2520 +1
Age Income BuysBook
30 30 -1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
16. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
-5 29 -1
Age Income > threshold
-5 • 30 + 29 • 30 = 720 +1
Age Income BuysBook
30 30 -1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
17. Learning Linear Models
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
-5 29 0
Age Income > threshold
-5 • 30 + 29 • 30 = 720 +1
Age Income BuysBook
30 30 -1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
18. Learning - Convergence
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
19. Learning - Convergence
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += row.class * row.attributes;
threshold += -row.class;
endif
end
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
20. Learning - Convergence
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += 0.001 * row.class * row.attributes;
threshold += -row.class;
endif
end
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
21. Learning - Convergence
repeat
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += 0.001 * row.class * row.attributes;
threshold += -row.class;
endif
end
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
22. Learning - Convergence
for i=1 to ∞
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += (1/i) * row.class * row.attributes;
threshold += -row.class;
endif
end
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
23. Learning - Convergence
for i=1 to ∞
row = readNextRow();
if(predict(weights, row.attributes) != row.class)
weights += (1/i) * row.class * row.attributes;
threshold += -row.class;
endif
end
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
24. Learning - Margin
for i = 1 to ∞
row = readNextRow();
if(margin(weights, row.attributes, threshold) <= 1)
weights += (1/n) * row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
0.5 0.25 26.5
Age Income Margin > threshold
0.5 • 24 + 0.25 • 60 = 27 +1
Age Income BuysBook
24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
25. Learning - Margin
for i = 1 to ∞
row = readNextRow();
if(margin(weights, row.attributes, threshold) <= 1)
weights += (1/n) * row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
0.5 0.25 26.5
Age Income Margin > threshold
0.5 • 24 + 0.25 • 60 = 27 +1
Age Income BuysBook
24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
26. Learning - Margin
for i = 1 to ∞
row = readNextRow();
if(margin(weights, row.attributes, threshold) <= 1)
weights += (1/n) * row.class * row.attributes;
threshold += -row.class;
endif
end
Weight 1 Weight 2 Threshold
0.5 0.25 26.5
Age Income Margin > threshold
0.5 • 24 + 0.25 • 60 = 27 +1
Age Income BuysBook
24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
27. Learning - Regularization
for i = 1 to ∞
Attributes are often row = readNextRow();
if(margin(weights, row.attributes, threshold) <= 1)
correlated weights += (1/n) * row.class * row.attributes;
threshold += -row.class;
• Contributions cancel out endif
end
• This leads to unreasonably
large weights...
• ... and models which are not Weight 1 Weight 2 Threshold
robust to noise
0.5 0.5 30
Regularization Age Income > threshold
• Make sure weights donʼt get too 0.5 • 24 + 0.5 • 60 = 42 +1
large
• L2 regularization: weights are Age Income BuysBook
proportional to attribute quality 24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
28. Learning - Regularization
for i = 1 to ∞
Attributes are often row = readNextRow();
if(margin(weights, row.attributes, threshold) <= 1)
correlated weights += (1/n) * row.class * row.attributes;
threshold += -row.class;
• Contributions cancel out endif
end
• This leads to unreasonably
large weights...
• ... and models which are not Weight 1 Weight 2 Threshold
robust to noise
1000 -399.3 30
Regularization Age Income > threshold
• Make sure weights donʼt get too 1000 • 24 + -399.3 • 60 = 42 +1
large
• L2 regularization: weights are Age Income BuysBook
proportional to attribute quality 24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
29. Learning - Regularization
for i = 1 to ∞
Attributes are often row = readNextRow();
if(margin(weights, row.attributes, threshold) <= 1)
correlated weights += (1/n) * row.class * row.attributes;
threshold += -row.class;
• Contributions cancel out endif
• This leads to unreasonably end
weights = i/(i+r) * weights;
large weights...
• ... and models which are not Weight 1 Weight 2 Threshold
robust to noise
1000 -399.3 30
Regularization Age Income > threshold
• Make sure weights donʼt get too 1000 • 24 + -399.3 • 60 = 42 +1
large
• L2 regularization: weights are Age Income BuysBook
proportional to attribute quality 24 60 +1
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
30. Implementation on Hadoop
Map-Reduce
• Input data must be in random order
• Mapper: send data to reducer in random order
• Reducer: run the actual Stochastic Gradient Descent
Evaluation and Parameter Selection
• Perform several runs with varying parameters
• Learn on training set, evaluate on test set
• Many runs with with partial data often better than one run with all data
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
31. Demo
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
32. Learning Linear Models
Stochastic Gradient Descent: Pros and Cons
• One sweep over the data: easy to implement on top of Hadoop
• Flexible: support vector machines, logistic regression, etc.
• Provides good enough estimate instead of optimum
• Parameter selection and evaluation are crucial
Alternative: convex optimization
• Formulate learning as numerical optimization problem
• On Hadoop: usually LBFGS
• See Vowpal Wobbit for a large scale implementation
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
33. Conclusion
Linear Models
• Prediction based on weighted vote and threshold
Stochastic Gradient Descent
• Adjust weight vector iteratively for each misclassified row
• Decreasing step size to ensure convergence
• Margins and regularization for robustness
Implementation
• Mapper provides random order, reducer performs SGD
• Evaluation and parameter selection are crucial
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013
34. Thanks
urueckert@datameer.com
© 2012 Datameer, Inc. All rights reserved.
Thursday, March 28, 2013