This talk is a primer to Machine Learning. I will provide a brief introduction what is ML and how it works. I will walk you down the Machine Learning pipeline from data gathering, data normalizing and feature engineering, common supervised and unsupervised algorithms, training models, and delivering results to production. I will also provide recommendations to tools that help you provide the best ML experience, include programming languages and libraries.
If there is time at the end of the talk, I will walk through two coding examples, using the HMS Titanic Passenger List, present with Python scikit-learn using algorithm random-trees to check if ML can correctly predict passenger survival and with R programming for feature engineering of the same dataset
Note to data-scientists and programmers: If you sign up to attend, plan to visit my Github repository! I have many Machine Learning coding examples in Python scikit-learn, GNU Octave, and R Programming.
https://github.com/jefftune/gitw-2017-ml
4. Why?: Curiosity
1.Math driven education Physics and
Applied Mathematics
2.first love of application programming
was Artifical Intelligence and Neural
Nets
3. unsatiated Curiosity to understand
Machine Learning
24. 1959, Arthur Samuel
Machine learning is the practice of
giving computers the ability to learn
without being explicitly programmed
to do so.
25. 1997, Tom Mitchell
A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P.
E * T = P
35. Awareness is like consciousness. Soul
is like spirit. But soft is not like
hard and weak is not like strong. A
mechanic can be both soft and hard, a
stewardess can be both weak and
strong. This is called philosophy or
a world-view.
36. BILL. I love a child.
MARCELLA. Children are fortunately captivating.
BILL. Yet my love is excellent.
MARCELLA. My love is spooky yet we must have a
child, a spooky child.
BILL. Do you follow me?
38. Machine Learning Pipeline
a framework
converting raw data to usable data
training an ML algorithm
deliver trained model
using model to perform actions
41. Machine Learning Pipeline: Training Phase
Training Data is passed to a Learning Algorithm, Create a Hypothesis,
check Cost against Weighted Errors.
42. Machine Learning Pipeline: Training Phase
Generate Model using the Hypothesis with the Least Weighted Errors.
61. Titanic Passengers List
Jack had a higher
likelihood to survive if
Rose stayed on the
lifeboat and already
saved.
Jack (single)
62. Titanic Passengers List
Instead, an extra hour of
Jack and Rose running around
a sink ship, finalizing with
frozen Jack, and Rose saved
(again). Thanks Rose.
Jack (attached)
63. How is Data handling within the
Machine Learning pipeline?
68. ● Dataset is a Matrix of measures
or events.
● Measure has a set of useful
Features.
● Feature is either undefined or
has
a value from a normalized set.
69. ● Dataset is a Matrix of measures or
events.
● Measure has a set of useful Features.
● Feature is either undefined or has
a value from a normalized set.
70. ● Dataset is a Matrix of measures or
events.
● Measure has a set of useful Features.
● Feature is either undefined or has
a value from a normalized set.
78. Types of Machine Learning
● Predictive Model, Labeled data
● Classification
● Numeric prediction
In supervised algorithms, you may not know the inner relations of the
data you are processing, but you do know very well which is the output
that you need from your model.
The training of the model usually uses part of the data to “learn”, and
part of the data to validate and measure how accurate the model is.
79.
80. Types of Machine Learning
● Descriptive model, Unlabeled data
● Clustering
● Pattern discovery
With unsupervised algorithms, you still don’t know what you want to
get out of the model yet.
You probably suspect that there hast to be some kinds of relationships
or correlation between the data you have, but data is too complex to
try to guess.
81.
82. Types of Machine Learning
● Rewards based modeling
Reinforcement learning is the field that studies the problems and
techniques that try to retro-feed it’s model in order to improve.
Relies on being able to monitor the response of the actions taken, and
measure against a definition of a “reward”.
83.
84.
85. Types of Machine Learning
● When Labelled Data is Costly
A class of supervised learning tasks and techniques that also make
use of unlabeled data for training.
Typically a small amount of labeled data with a large amount of
unlabeled data.
91. These are the most important math concepts that underlie
Machine Learning and I will demonstrate where they are
applied to “Learning”.
Gradient Descent
Cross-Entropy Loss
Bayes’ Theorem
K-Means
Linear Regression
97. regression models the past relationship
between variables to predict their future
behavior.
regression analysis is a set of statistical
processes for estimating the relationships
among variables.
from Statistical Learning, the underpinnings of ML
98. Example: Food truck Profit to Population Profitability
Profitin$10,000
Population in 10,000
As CEO of a restaurant
franchise and are considering
different cities for opening a
new outlet. The chain already
has food trucks in various cities
and you have data for Profits
and Populations from the
cities.
99. Food truck Profit to Population Profitability
Quiz: What would be the
approximate Profit if Food
Truck is located in City with
Population 150,000?
[ ] $50,000
[ ] $100,000
[ ] $150,000
[ ] $200,000
Profitin$10,000
Population in 10,000
100. Food truck Profit to Population Profitability
Quiz: What would be the
approximate Profit if Food
Truck is located in City with
Population 150,000?
[ ] $50,000
[ ] $100,000
[X] $150,000
[ ] $200,000
$138,000
Population in 10,000
Profitin$10,000
101. Food truck Profit to Population Profitability
Population in 10,000
Profitin$10,000
Linear Regression
Algorithm
finding the straight line that
best-fits the values of a linear
function, plotted on a scatter
graph as data points, used as
the basis for estimating the
future values.
102. Food truck Profit to Population Profitability
Population in 10,000
Profitin$10,000
linear regression line
103. ● Data is Food Truck Profit to Population
● Algorithm for Training is using Linear Regression
● Model when Trained will values filled into this function:
105. Finding “Linear Regression Line”
Provide a line of slope
and the line intersects axis
at location
And for a known data point
then y-intercept is for this line is
114. Linear Regression Algorithm: Summary
Find the linear regression line that
best cuts the value data in two
in order to make the
best value prediction
with the
least possible error.
116. University Admissions Acceptance
Using historical data from
previous applicants, predict
whether a student gets
admitted into a university
based upon the applicant’s
scores on two exams.
Exam2score
Exam 1 score
Admitted
Not-Admitted
118. University Admissions Acceptance
Exam2score
Exam 1 score
Quiz: For a student with scores
for Exam 1 at 45 and Exam 2 at
85, does the get accepted?
[X] Yes
[ ] No
We predict an admission with
confidence of 77.6289 %
119. University Admissions Acceptance
Exam2score
Exam 1 score
Logistic Regression
Algorithm
Predicts the probability that an
observation falls into one of two
categories of a dichotomous
dependent variable based on
one or more independent
variables.
120. ● Data is University Admissions: Exam 1, Exam 2, Class: Admitted
● Algorithm for Training is using Logistic Regression
● Model when Trained will values filled into this function:
121. Logistic Regression Algorithm
It is an logistic regression model
where the dependent variable is
binary and categorical, that is,
where it can take only two values:
“0” OR “1”, representing for
example:
Yes or No
Accept or Not Accept
Cat or Dog
1
0
0.5
129. x1, x2 : exams
f(x1, x2) : university acceptance
130. Logistic Regression Algorithm: Summary
Find the logistic regression line that
best cuts the classifications in two
in order to make the
best class decision
with the
least possible error.
136. Maybe two lines, which works
better. This solution requires
using
Neural Network
Algorithm
University Admissions AcceptanceExam2score
Exam 1 score
140. Determine Which Fruit
Weight Texture Class
150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
(4, 2)
['Weight (grams)',
'Texture']
['apple', 'orange']
141. Determine Which Fruit
Weight Texture Class
150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Quiz: Between Weight and
Texture, which seems more
decisive for predicting what
Fruit will be determined?
[ ] Weight
[ ] Texture
142. Determine Which Fruit
Weight Texture Class
150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Quiz: Between Weight and
Texture, which seems more
decisive for predicting what
Fruit will be determined?
[ ] Weight
[ ] Texture
143. Determine Which Fruit
Weight Texture Class
150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Quiz: Between Weight and
Texture, which seems more
decisive for predicting what
Fruit will be determined?
[ ] Weight
[ ] Texture
144. Determine Which Fruit
Weight Texture Class
150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Quiz: Between Weight and
Texture, which seems more
decisive for predicting what
Fruit will be determined?
[ ] Weight
[X] Texture
145. Determine Which Fruit
Weight Texture Class
150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Decision Tree Algorithm
161. Detecting Spam e-mails
Spam Not-Spam
“Winner” Quiz: If an e-mail contains the
word “Winner”, what is the
Probability of being Spam?
[ ] 40%
[ ] 60%
[ ] 80%
162. Detecting Spam e-mails
Spam Not-Spam
“Winner” Quiz: If an e-mail contains the
word “Winner”, what is the
Probability of being Spam?
[ ] 40%
[ ] 60%
[ X ] 80%
Conclusion: If an e-mail
contains the word “Winner”,
then the Probability it being
Spam is 80%.
80 % 20 %
167. Tools to build ML Learning Pipeline
Python has a rich set
of packages for
delivering Machine
Learning solutions.
168. Tools to build ML Learning Pipeline
GNU Octave is an open
source high-level
programming language
intended for numerical
computations.
Great for understanding
and defining Algorithms
within Machine Learning.Andrew Ng
169. Tools to build ML Learning Pipeline
R is an open source
programming language for
statistical computing and
graphics.
Faster in getting up to speed
with Machine Learning.
Great for in depth data analysis
and feature engineering.
170. Tools to build ML Learning Pipeline
Machine Learning Philosophy Machine Learning places a
greater emphasis on
predictive accuracy.
Scikit-learn focuses more
on helping you to
maximize the accuracy of
your models.
171. Tools to build ML Learning Pipeline
Statistical Learning Philosophy Statistical Learning emphasizes
model interpretability and
uncertainty.
R and GNU Octave tends to
offer more capabilities for
understanding your models and
data gathered.
184. "Mathew, where's the lamb chop?" whispered Helene.
"Lamb chops, you mean," sang Mathew; "you, me, Wendy, and
John can't all swallow one lamb chop."
"And Mark, he also desires lamb chops," said Wendy.
"Now wait," sang Mathew; "let's struggle to understand where
spooky old Mark is."
"Mark said that he was rambling over to eat with us," cried
Helene; "he's sashaying up some turnpike right now."
"Mark, oh, Mark, skip briskly; it would facilitate us to start bolting
our lamb chops speedily," chanted John carefully.
185. Meanwhile Mark winged in, whispering, "A supper, a breakfast,
a repast, quick; it can be tasty or well cooked or delicious; I
don't care; I'm hungrily famished. I've sauntered some clean
streets; I was thinking about yachts, the sea, and the ocean;
I'm exhausted."
"Yachts?" each of them said.
"Yes, yachts, a hoard of yachts floating on the sea. This yacht
pondering let me be unwound during my skip over here."
"Better yachts in the sea than a sickening electron in a
revolting galaxy," hummed Helene.
https://thenewstack.io/what-machine-learning-can-and-cant-do/
context that a machine learning algorithm would need to understand in order to predict needs in a given business situation
context that a machine learning algorithm would need to understand in order to predict needs in a given business situation
machine learning as a subdiscipline of artificial intelligence. Both disciplines overlap, but they also have separate areas of knowledge. In the case of machine learning, it would include data mining techniques (a term generally applied to information extraction techniques based on raw data), which are not part of artificial intelligence.
Statistical learning theory deals with the problem of finding a predictive function based on data, which has led to successful applications in fields such as computer vision, speech recognition, bioinformatics and baseball.
After all, you’re teaching machines that work in ones and zeros to reach their own conclusions about the world. You’re teaching them how to think! However, it’s not nearly as hard as the complex and formula-laden literature would have you believe.
Like all of the best frameworks we have for understanding our world, e.g. Newton’s Laws of Motion, Jobs to be Done, Supply & Demand — the best ideas and concepts in machine learning are simple. The majority of literature on machine learning, however, is riddled with complex notation, formulae and superfluous language. It puts walls up around fundamentally simple ideas.
Siraj Raval
Training Phase:
Converting raw data to data usable by ML algorithm
Training an ML algorithm
Test the prediction-accuracy of the pipeline
Real-Time Prediction Phase:
Using the output of the ML algorithm to perform actions in the real-world
It’s easy to believe that machine learning is hard. An arcane craft known only to a select few academics.
It’s easy to believe that machine learning is hard. An arcane craft known only to a select few academics.
p = sigmoid(\beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2})