Primer to Machine Learning

Primer to Machine
Learning
or How to get Machines to Think
Jeff Tanner, Engineer @ TUNE

Take Away from this Talk:
What underlies Machine Learning
by explaining how it Learns.

Why?: Curiosity
1.Math driven education Physics and
Applied Mathematics
2.first love of application programming
was Artifical Intelligence and Neural
Nets
3. unsatiated Curiosity to understand
Machine Learning

Graduate and Postgraduate:
Artificial Intelligence
International consultant:
Knowledge Engineering

Bright idea #1
Learn more about
Machine Learning
Satisfy Curiosity
(and perform upon functional
test on rebooted brain)

Read lots and Watched lots (repeat many
times)
Siraj RavalAndrew Ng

Bright idea #2
Eager to Share Curiosity
Lecture

Topics:
● What is it?
● What can it do?
● Explain pipeline?
● Data
● Types
● Algorithms
● Shallow dive

Code:
● Breakout to checkout:
○ Python Scikit-Learn
○ GNU Octave (MatLab)
○ R Programming
https://github.com/jefftune/gitw-2017-ml

What is Machine Learning?
Learn from Experience
People

Learn from Experience Follow instructions
People Traditional Programming
Models

Learn from Experience Follow instructions Learn from Data
People Machine Learning
Models
Traditional Programming
Models

Machine Learning is an artificial
intelligence technology
learning over time in an
autonomous fashion
without being explicitly
programmed

What Machine
Learning Can Do:
● Data Modeling
● Classification
● Value Prediction
● Serving Content
● Pattern
Recognition

What Machine
Learning Cannot Do:
● Clean the Data
● Leap Over
Pareto’s
Principle

Statistical Learning
underlies Machine Learning

● Mechanophobia (fear of machines)
● Sophophobia (fear of learning)
● Mechano-Sophophobia?
Still scared of
Machine Learning?

● Easy to believe that ML is hard.
● Embrace predictions.

Defining “Learning” of Machines

1959, Arthur Samuel
Machine learning is the practice of
giving computers the ability to learn
without being explicitly programmed
to do so.

1997, Tom Mitchell
A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P.
E * T = P

Learning
Algorithm
Cat / Not Cat Cat: 80%
this is a cat
E
Training
Prediction
T P
Model
E * T = P

Checkers:
E * T = P
E: practice
T: playing
P: % won

Handwriting:
E * T = P
E: handwritten
words
T: classifying
words
P: % words

Self-Driving:
E * T = P
E: observe human
driver
T: drive
P: % human judged
error

Using to Machine Learning
Algorithms

Supervised: Predict Value
Profitin$10,000
Population in 10,000
Profitability

Supervised: Classification
Exam2score
Exam 1 score

Unsupervised: Clusters
Customer Segmentation

Reinforcement: Bot Conversation Example
Charlotte BrontëThomas Holcroft
Christopher Marlowe

Awareness is like consciousness. Soul
is like spirit. But soft is not like
hard and weak is not like strong. A
mechanic can be both soft and hard, a
stewardess can be both weak and
strong. This is called philosophy or
a world-view.

BILL. I love a child.
MARCELLA. Children are fortunately captivating.
BILL. Yet my love is excellent.
MARCELLA. My love is spooky yet we must have a
child, a spooky child.
BILL. Do you follow me?

What is the Machine Learning
pipeline?

Machine Learning Pipeline
a framework
converting raw data to usable data
training an ML algorithm
deliver trained model
using model to perform actions

Training/Learning Phase
Real-Time Prediction Phase

Machine Learning Pipeline: Training Phase
Gather lots of Data, Prepare for Learning.

Training Data is passed to a Learning Algorithm, Create a Hypothesis,
check Cost against Weighted Errors.

Generate Model using the Hypothesis with the Least Weighted Errors.

Tool: R-Programming: Data and Feature
Engineering

Tool: GNU Octave: Algorithm Design

Tool: Python Scikit-Learn: ML Delivery

Fictional Prediction
discovered doing
Machine Learning
with R-Programming

Titanic Passengers List
Should Rose have stayed on the
lifeboat?

Jack had a higher
likelihood to survive if
Rose stayed on the
lifeboat and already
saved.
Jack (single)

Instead, an extra hour of
Jack and Rose running around
a sink ship, finalizing with
frozen Jack, and Rose saved
(again). Thanks Rose.
Jack (attached)

How is Data handling within the
Machine Learning pipeline?

● Dataset is a Matrix of measures
or events.
● Measure has a set of useful
Features.
● Feature is either undefined or
has
a value from a normalized set.

● Dataset is a Matrix of measures or
events.
● Measure has a set of useful Features.
● Feature is either undefined or has
a value from a normalized set.

Types of Machine Learning
● Predictive Model, Labeled data
● Classification
● Numeric prediction
In supervised algorithms, you may not know the inner relations of the
data you are processing, but you do know very well which is the output
that you need from your model.
The training of the model usually uses part of the data to “learn”, and
part of the data to validate and measure how accurate the model is.

● Descriptive model, Unlabeled data
● Clustering
● Pattern discovery
With unsupervised algorithms, you still don’t know what you want to
get out of the model yet.
You probably suspect that there hast to be some kinds of relationships
or correlation between the data you have, but data is too complex to
try to guess.

● Rewards based modeling
Reinforcement learning is the field that studies the problems and
techniques that try to retro-feed it’s model in order to improve.
Relies on being able to monitor the response of the actions taken, and
measure against a definition of a “reward”.

● When Labelled Data is Costly
A class of supervised learning tasks and techniques that also make
use of unlabeled data for training.
Typically a small amount of labeled data with a large amount of
unlabeled data.

You do not need to know
the following to deliver
Machine Learning solutions

However…
If you truly want to understand or
create Machine “learning”
algorithms, then Math is helpful.

And…
Understanding the Math of
“learning” is
addictively fun,
Ask my spouse, or maybe not

These are the most important math concepts that underlie
Machine Learning and I will demonstrate where they are
applied to “Learning”.
Gradient Descent
Cross-Entropy Loss
Bayes’ Theorem
K-Means
Linear Regression

Calculus
Linear Algebra
Probability Theory
Statistics
And these math concepts are based upon
the fundamentals of ...

Basic Algebra
Have no fear!
Applied Mathematics breaks down
all Math behind Machine Learning to ...

This is
the Math of Intelligence
behind how
Machines learn from Data

Learning behind Supervised
Algorithms

regression models the past relationship
between variables to predict their future
behavior.
regression analysis is a set of statistical
processes for estimating the relationships
among variables.
from Statistical Learning, the underpinnings of ML

Example: Food truck Profit to Population Profitability
Profitin$10,000
As CEO of a restaurant
franchise and are considering
different cities for opening a
new outlet. The chain already
has food trucks in various cities
and you have data for Profits
and Populations from the
cities.

Food truck Profit to Population Profitability
Quiz: What would be the
approximate Profit if Food
Truck is located in City with
Population 150,000?
[ ] $50,000
[ ] $100,000
[ ] $150,000
[ ] $200,000
Profitin$10,000

Quiz: What would be the
approximate Profit if Food
Truck is located in City with
Population 150,000?
[ ] $50,000
[ ] $100,000
[X] $150,000
[ ] $200,000
$138,000
Profitin$10,000

Profitin$10,000
Linear Regression
Algorithm
finding the straight line that
best-fits the values of a linear
function, plotted on a scatter
graph as data points, used as
the basis for estimating the
future values.

Profitin$10,000
linear regression line

● Data is Food Truck Profit to Population
● Algorithm for Training is using Linear Regression
● Model when Trained will values filled into this function:

Finding “Linear Regression Line”
An actual data point instance:

Provide a line of slope
and the line intersects axis
at location
And for a known data point
then y-intercept is for this line is

a.k.a Global Minimum

x : population
f(x) : predicted profit

Non-Linear or Polynomial Regression

Linear Regression Algorithm: Summary
Find the linear regression line that
best cuts the value data in two
in order to make the
best value prediction
with the
least possible error.

Supervised: Logistic Regression

University Admissions Acceptance
Using historical data from
previous applicants, predict
whether a student gets
admitted into a university
based upon the applicant’s
scores on two exams.
Exam2score
Exam 1 score
Admitted
Not-Admitted

Exam2score
Exam 1 score
Admitted
Not-Admitted
Quiz: For a student with scores
for Exam 1 at 45 and Exam 2 at
85, does the get accepted?
[ ] Yes
[ ] No

Exam2score
Exam 1 score
Quiz: For a student with scores
for Exam 1 at 45 and Exam 2 at
85, does the get accepted?
[X] Yes
[ ] No
We predict an admission with
confidence of 77.6289 %

Exam2score
Exam 1 score
Logistic Regression
Algorithm
Predicts the probability that an
observation falls into one of two
categories of a dichotomous
dependent variable based on
one or more independent
variables.

● Data is University Admissions: Exam 1, Exam 2, Class: Admitted
● Algorithm for Training is using Logistic Regression
● Model when Trained will values filled into this function:

Logistic Regression Algorithm
It is an logistic regression model
where the dependent variable is
binary and categorical, that is,
where it can take only two values:
“0” OR “1”, representing for
example:
Yes or No
Accept or Not Accept
Cat or Dog
1
0
0.5

Example:
Swallow: African OR European
Based on Airspeed velocity.
sigmoid

“Exam 1” x1 “Exam 2” x2

Logistic Regression AlgorithmExam1Exam2
1
0
0.5

prediction_start prediction_min

Gradient Descent using Log-Loss
Log Loss quantifies
the accuracy of a
classifier by penalising
false classifications.

x1, x2 : exams
f(x1, x2) : university acceptance

Logistic Regression Algorithm: Summary
Find the logistic regression line that
best cuts the classifications in two
in order to make the
best class decision
with the
least possible error.

University Admissions AcceptanceExam2score
Exam 1 score
What if the admitted into a
university based upon the
applicant’s scores on two
exams was more restrictive?
Admitted
Not-Admitted

Exam 1 score
Admitted
Not-Admitted
And there were fewer actual
admissions in the historical
data.

Exam 1 score
Admitted
Not-Admitted
A single line will not cut the
data in two.

Exam 1 score
Admitted
Not-Admitted
Maybe a circle

Maybe two lines, which works
better. This solution requires
using
Neural Network
Algorithm
Exam 1 score

Supervised: Decision Tree: Fruit

Determine Which Fruit
(4, 2)
['Weight (grams)', 'Texture']
['apple', 'orange']

Weight Texture Class
150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
(4, 2)
['Weight (grams)',
'Texture']
['apple', 'orange']

150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Quiz: Between Weight and
Texture, which seems more
decisive for predicting what
Fruit will be determined?
[ ] Weight
[ ] Texture

150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Quiz: Between Weight and
Texture, which seems more
decisive for predicting what
Fruit will be determined?
[ ] Weight
[X] Texture

150g Bumpy Orange
170g Bumpy Orange
150g Smooth Apple
130g Smooth Apple
Decision Tree Algorithm

Supervised: Decision Tree: Iris

Classification of Iris Varieties

Classification of Iris Varieties
(150, 4)
['sepal length (cm)', 'sepal width (cm)',
'petal length (cm)', 'petal width (cm)']
['setosa', 'versicolor', 'virginica']
[ 5.1 3.5 1.4 0.2]
setosa

Classification of Iris Varieties: Decision Tree

Accuracy Score
0.911111111111
Accuracy Score
0.622222222222

[2 0 0 0 0 0 0 0 0 1 0 0 0 1 1 2 0 0 0 2 2 0 2 2 1 1 2 1 1 0 1 0 1 2 0 1 2
0 2 2 0 1 2 2 1 1 2 2 1 2 2 2 0 0 1 1 1 1 1 1 2 1 2 2 0 0 2 0 2 2 0 2 1 0
2]
Predictions
[2 2 1 1 2 0 0 0 2 2 1 0 0 0 1 0 0 0 1 0 0 2 2 1 0 2 2 1 0 1 1 0 2 1 0 2 2
0 0 0 2 2 1 1 1 1 1 1 2 0 1 2 1 0 2 0 2 1 0 2 1 2 2 0 0 0 2 1 0 1 2 0 1 2
0]
Accuracy Score
0.96

Classification of Iris Varieties: Random Forest

Accuracy Score
0.622222222222
Accuracy Score
0.955555555556

[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2
1 0
1 1 1 2 0 2 0 0]
Predictions
[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2
1 0
2 1 1 2 0 2 0 0]
Accuracy Score
0.955555555556

Detecting Spam e-mails
100 emails looked at already

Spam Not-Spam

Spam Not-Spam
“Winner”

Spam Not-Spam
“Winner” Quiz: If an e-mail contains the
word “Winner”, what is the
Probability of being Spam?
[ ] 40%
[ ] 60%
[ ] 80%

Spam Not-Spam
“Winner” Quiz: If an e-mail contains the
word “Winner”, what is the
Probability of being Spam?
[ ] 40%
[ ] 60%
[ X ] 80%
Conclusion: If an e-mail
contains the word “Winner”,
then the Probability it being
Spam is 80%.
80 % 20 %

“Winner” 80 %
Spelling error 60 %
Missing title 90 %
etc....

“Winner” 80 %
Spelling error 60 %
Missing title 90 %
Naive Bayes Algorithm

Tools to build ML Learning Pipeline

Python has a rich set
of packages for
delivering Machine
Learning solutions.

GNU Octave is an open
source high-level
programming language
intended for numerical
computations.
Great for understanding
and defining Algorithms
within Machine Learning.Andrew Ng

R is an open source
programming language for
statistical computing and
graphics.
Faster in getting up to speed
with Machine Learning.
Great for in depth data analysis
and feature engineering.

Machine Learning Philosophy Machine Learning places a
greater emphasis on
predictive accuracy.
Scikit-learn focuses more
on helping you to
maximize the accuracy of
your models.

Statistical Learning Philosophy Statistical Learning emphasizes
model interpretability and
uncertainty.
R and GNU Octave tends to
offer more capabilities for
understanding your models and
data gathered.

Machine Learning Examples:
Jupyter Notebooks and Code Files

Jupyter Notebooks with Examples:

Two end-to-end Solutions: Titanic Passengers

Iris Dataset
Decision Tree
Random Decision Forest

Off-Topic:
Computer Generated Novelette

"Mathew, where's the lamb chop?" whispered Helene.
"Lamb chops, you mean," sang Mathew; "you, me, Wendy, and
John can't all swallow one lamb chop."
"And Mark, he also desires lamb chops," said Wendy.
"Now wait," sang Mathew; "let's struggle to understand where
spooky old Mark is."
"Mark said that he was rambling over to eat with us," cried
Helene; "he's sashaying up some turnpike right now."
"Mark, oh, Mark, skip briskly; it would facilitate us to start bolting
our lamb chops speedily," chanted John carefully.

Meanwhile Mark winged in, whispering, "A supper, a breakfast,
a repast, quick; it can be tasty or well cooked or delicious; I
don't care; I'm hungrily famished. I've sauntered some clean
streets; I was thinking about yachts, the sea, and the ocean;
I'm exhausted."
"Yachts?" each of them said.
"Yes, yachts, a hoard of yachts floating on the sea. This yacht
pondering let me be unwound during my skip over here."
"Better yachts in the sea than a sickening electron in a
revolting galaxy," hummed Helene.

Steps of Training Predictive Modelling

Primer to Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Primer to Machine Learning

Similar to Primer to Machine Learning (20)

Recently uploaded

Recently uploaded (20)

Primer to Machine Learning

Editor's Notes