Machine Learning can often be a daunting subject to tackle much less utilize in a meaningful manner. In this session, attendees will learn how to take their existing data, shape it, and create models that automatically can make principled business decisions directly in their applications. The discussion will include explanations of the data acquisition and shaping process. Additionally, attendees will learn the basics of machine learning - primarily the supervised learning problem.
3. Agenda
1) data science
2) prediction
3) process
4) models
5) AzureML
4. data science
• key word: “science”
• try stuff
• it (might not | won’t) work
the first time
question • this might work…
research • wikipedia time
hypothesis • I have an idea
experiment • try it out
analysis • did this even work?
conclusion • time for a better idea
5. machine learning
• finding (and exploiting) patterns in data
• replacing “human writing code” with
“human supplying data”
• system figures out what the person wants
based on examples
• need to abstract from “training” examples
to “test” examples
• most central issue in ML: generalization
6. machine learning
• split into two (ish) areas
• supervised learning
• predicting the future
• learn from past examples to predict future
• unsupervised learning
• understanding the past
• making sense of data
• learning structure of data
• compressing data for consumption
11. making decisions
• what kinds of decisions are we making?
• binary classification
• yes/no, 1/0, male/female
• multi-class classification
• {A, B, C, D, F} (Grade),
{1, 2, 3, 4} (Class),
{teacher, student, secretary}
• regression
• number between 0 and 100, real value
11
13. data
Class Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
label (y)
play / no play
features
outlook, temp, windy
values (x)
[Sunny, Low, Yes]
Labeled dataset is a collection of (X, Y) pairs.
Given a new x, how do we predict y?
14. clean / transform / maths
Class Outlook Temp. Windy
Play Sunny Lowest Yes
No Play ? High Yes
No Play Sunny High KindOf
Play Overcast ? Yes
Play Turtle Cloud High No
Play Overcast ? No
No Play Rainy Low 28%
Play Rainy Low No
? Sunny Low No
need to clean up data
need to convert to model-able form (linear algebra)
yak shaving
Any apparently useless activity
which, by allowing you to
overcome intermediate difficulties,
allows you to solve a larger
problem.
I was doing a bit of yak shaving
this morning, and it looks like it
might have paid off.
http://en.wiktionary.org/wiki/yak_shaving
15. clean / transform / maths
Class Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
need to clean up data
need to convert to model-able form (linear algebra)
16. model
Class Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
19. linear classifiers
• in order to classify things properly we need:
• a way to mathematically represent examples
• a way to separate classes (yes/no)
• “decision boundary”
• excel example
• graph example
19
MODELS
20. linear classifiers
• dot product of vectors
• [ 3, 4 ] ● [ 1, 2 ] = (3 × 1) + (4 × 2) = 11
• a ● b = | a | × | b | cos θ
• When does this equal 0?
• why would this be useful?
• decision boundary can be represented using a single vector
20
MODELS
22. linear classifiers
• Frank Rosenblatt, Cornell 1957
• let’s make a line (by using a single vector)
• take the dot product between the line and the new point
• > 0 belongs to class 1
• < 0 belongs to class 2
• == 0 flip a coin we don’t know
• for each example, if we make a mistake, move the line
22
MODELS
28. perceptron
• minimize mistakes by moving w
arg min
(풘,풃)
1
2
풘 2
subject to:
푦푖 풘 ∙ 풙풊 − 푏 ≥ 1
REMINDER
29. perceptron
• eventually this becomes an optimization problem
퐿 훼 =
푛
푖=1
훼푖 −
1
2
푖,푗
푇풙푗
훼푖훼푗 푦푖푦푗풙푖
subject to:
훼푖 ≥ 0,
푛
푖=1
훼푖푦푖 = 0
REMINDER
30. perceptron
• eventually this becomes an optimization problem
퐿 훼 =
푛
푖=1
훼푖 −
1
2
푖,푗
푇풙푗
훼푖훼푗 푦푖푦푗풙푖
subject to:
훼푖 ≥ 0,
푛
푖=1
훼푖푦푖 = 0
REMINDER
31. perceptron
• eventually this becomes an optimization problem
퐿 훼 =
푛
푖=1
훼푖 −
1
2
푖,푗
훼푖훼푗푦푖푦푗 푘 풙푖 , 풙푗
subject to:
훼푖 ≥ 0,
푛
푖=1
훼푖푦푖 = 0
REMINDER
dot product
32. perceptron
• Frank Rosenblatt, Cornell 1957
• let’s make a line (by using a single vector)
• take the dot product between the line and the new point
• > 0 belongs to class 1
• < 0 belongs to class 2
• == 0 flip a coin we don’t know
• for each example, if we make a mistake, move the line
32
REMINDER
33. kernel (one weird trick….)
MODELS
• store dot product in a table
푇풙0 ⋯ 풙0
풙0
푇풙푗
⋮ ⋱ ⋮
풙푇푖
풙0 ⋯ 풙푖
푇풙푗
• call it the “kernel matrix” and “kernel trick”
• project into any space and still learn a linear model
34. support vector machines
MODELS
• this method is the basis for SVM’s
• returns a set of vectors (<< n) to make decision
• essentially changed the space to make it separable
43. decision trees
Class Outlook Temp. Windy
Play Sunny Low Yes
No Play Sunny High Yes
No Play Sunny High No
Play Overcast Low Yes
Play Overcast High No
Play Overcast Low No
No Play Rainy Low Yes
Play Rainy Low No
? Sunny Low No
44. decision trees
• how should the computer split?
• information gain (with entropy)
• entropy measures how disorganized your
answer is.
• information gain says:
• if I separate the answer by the values in a
particular column, does the answer become
*more* organized?
45. decision trees
• calculating information gain:
퐼퐺 푦, 푎 = 퐻 푦 − 퐻 푦 푎)
푎 ∈ 퐴푡푡푟(푥)
• 퐻 푦 – how messy is the answer
• 퐻 푦 푎) – how messy is the answer if we
know a?