SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Machine Learning pt1: Classification, Regression,
and Artificial Neural Networks
Self-Driving-Cars Los Angeles
By Jonathan Mitchell
github.com/jonathancmitchell
linkedin.com/in/jonathancmitchell
jmitchell1991@gmail.com
Self Driving Cars Los Angeles
https://www.meetup.com/Los-Angeles-Self-Driving-Car-Meetup/
Welcome to Machine
Learning
aka computational statistics
How did I learn this?
Sources:
● Udacity’s Self-Driving Car Nanodegree problem (udacity.com/drive)
● MIT Self-Driving Car program (selfdrivingcars.mit.edu)
● Stanford’s cs-231n (cs231n.github.io)
Topics
A) Probability basics - Basics to Logits
B) Linear Classification/ Logistic regression overview
C) Perceptron
D) Perceptron (biological inspiration)
E) Neuron
F) Forward Pass
G) Computing a loss function
H) Visualizing Hidden Layers
I) Setting up training data
J) Preprocessing / Normalization
K) Overfitting / Hyperparameter intro
L) Epochs
M) Minibatch
N) Gradient Descent / Stochastic Gradient Descent
O) Backpropagation
P) Cross Entropy Loss
Probability basics
Probability p = all outcomes of interest / all possible outcomes
P can range from (0, 1) inclusive. P = 1 = 100% likelihood P = 0 = 0% likelihood
Odds: The likelihood of an event P happening:
Coin Toss: Toss a coin in the air, it can either be heads or tails.
P_heads = 0.5 => P_tails = 0.5 = 1 - P_heads
P1 = Heads Probability (0, 1), P0 = Tails Probability (0,1)
Odds ratio can be written as Odds1 : Odds2. In this case 1:1. Equal chance of getting
Heads or tails
P(Not Tails)
P(Tails)
P(Heads)
P(Not Heads)
Odds_heads
Odds_tails
Bernoulli Probability (A specific case of binomial distribution)
Bernoulli Probability: A yes or no question. Two possible outcomes: Success and Fail
p = probability of success (of one trial)
q = probability of fail (of one trial) = 1 - p
unknown probability p.
N = number of trials = 1
Probability of K successes.
K = 1 for Bernoulli
A Bernoulli distribution is a special case of a Binomial Distribution with N = 1 trial.
Binomial Probability
Probability Basics -> Logistic Regression
Goal: Estimate an unknown probability p for any given linear combination of the
independent variables.
Link independent variables to the Ber(p) distribution.
Logistic regression: estimate an unknown probability p for any given linear
combination of the independent variables.
Estimate p = p^
Need function that maps linear combination of variables that can result in any
value onto the bernoulli probability distribution with a domain from 0 to 1.
Use Logit: Natural log of the odds ratio
Logistic Regression
Logit: natural log of odds ratio
Undefined at P = 0, P = 1
Good P domain (0, 1)
Linear Combination
graph from http://www.graphpad.com/support/faqid/1465/
Logistic Regression
Logit:
α will be the linear combination of
independent variables and their
coefficient
Recall:
Ber(p) = logit(p) - logit(1-p)
Inverse logit gives us the
probability that dependent var (p)
is a “1”
Linear Combination
Probability of x with linear
combination mapping (B and B0)
Binary Output variable Y. We
want to model the conditional
probability Pr(Y = 1 | X = x) as a
function of x; any unknown
parameters are to be estimated
by max likelihood
graph from http://www.graphpad.com/support/faqid/1465/
Logistic Regression -> Linear Classification
To classify we seek a binary output variable Y = 1 or 0.
Recall Pr(Y = 1 | X = x). We modeled this as p(x;b,w)
Predict Y = 1 when p >= 0.5. Y = 1 = Class A
Predict Y = 0 when P < 0.5. Y = 0 = Class B
Guess 1 when B + B0 is non-negative
Guess 0 when B + B0 is negative
This is a linear classifier.
We can also infer that the probabilities depend on the distance from the boundary.
This is known as a Binary Logistic Classifier (Binary = 2 options, Class A or Class B)
The decision boundary
separates the two predicted
classes and is the solution to
this equation
Graph from
http://pubs.rsc.org/services/images/RSCpubs.ePlatform.Service.FreeContent.ImageServic
e.svc/ImageService/Articleimage/2010/AN/b918972f/b918972f-f7.gif
Logistic Regression -> Linear Classification
To classify we seek a binary output variable Y = 1 or 0.
Recall Pr(Y = 1 | X = x). We modeled this as p(x;b,w)
Predict Y = 1 when p >= 0.5. Y = 1 = Class A
Predict Y = 0 when P < 0.5. Y = 0 = Class B
Guess 1 when B + B0 is non-negative
Guess 0 when B + B0 is negative
This is a linear classifier.
We can also infer that the probabilities depend on the distance from the boundary.
This is known as a Binary Logistic Classifier (Binary = 2 options, Class A or Class B)
The decision boundary
separates the two predicted
classes and is the solution to
this equation
Graph from
http://pubs.rsc.org/services/images/RSCpubs.ePlatform.Service.FreeContent.ImageServic
e.svc/ImageService/Articleimage/2010/AN/b918972f/b918972f-f7.gif
Neuron: Building block of a neural network
src: MIT-Self-Driving-Cars, Fridman,
A Neuron is a computational
building block of the brain.
Human brain: 1000T synapses
10x that of an Artificial Neuron
Artificial Neuron is a
computational building block of
an artificial neural network.
~1-10B synapses
*Takes a set of inputs
*Places a weight of each input
*sums them together
*applies a bias value on each
neuron
*Uses an activation function that
takes in the sum plus bias and
squeeze values together into a
probability distribution (range 0, 1)
Takes a few inputs and places an
output
Classification: output: 1 or 0
This can serve as a linear classifier
src: MIT-Self-Driving-Cars, Fridman,
Perceptron Algorithm
X1
X3
X2
Output
1. Initialize perceptron with random weights.
2. Compute perceptrons output
3. If output does not match known output
a. if output should have been 0 but was 1, decrease the weights that had an input of 1
b. if output should have been 1 but was 0, increase the weights that had an input of 1
4. Move on to next example in the training set until perceptron makes no more mistakes
src: MIT-Self-Driving-Cars, Fridman,
If output does not match
expected output = Punish!
Your output
neurons
didn’t match
the expected
output.
X1
X3
X2
Output
Expected Output: Cat but we got Burrito
Training
Images
Perceptron
Why Neural Networks are great.
X1
X3
X2
Output
Perceptron
We can use the Hidden
Layer to approximate any
function
Universality: We can
closely approximate any
function f(x) with a single
hidden layer.
Driving: Input (sensor data
from the world)
Output: Drive (use steering
data etc)
src: MIT-Self-Driving-Cars, Fridman,
Lex
Dual class Linear Classification with Binary Logistic Regression
Input Data
Goal: To predict class A or class
B from input data.
Two possible outputs!
x
Linear Combination
Logistic Regression
Predictor
Class A is Y >= 0.5
Class B is Y < 0.5
P = 1
P = 0
Squeezes Values
between 0 and 1
Scores
(0,1)
range
Notation changeup:
logit-1 -> sigmoid
Input Data
Two possible outputs!
x
Linear Combination
Logistic Regression
P = 1
P = 0
Squeezes Values
between 0 and 1
puts into probability
distribution
Predictor
Class A is Y >= 0.5
Class B is Y < 0.5
Scores
(0,1)
range
Unnormalized log probabilities
Generalizing Logistic Regression to multiple classes
If we have two classes we can
have two possible outputs: 1 or 0
What if we have 10 classes?
Binary - Two Outputs
Y either 1 or 0
Supposed we have k classes.
Let’s switch up some notation:
Now set each score s to the
result of that function.
Probability that output Y = class
K.
We have J possible classes.
Perform softmax on scores
Softmax Classifier
is Binary Logistic regression
applied to multiple classes
Output = scores b/w 0 and 1
Scores
Notation changeup:
logit-1 -> sigmoid
Input Data
Two possible outputs!
x
Linear Combination
Logistic Regression
P = 1
P = 0
Predictor
Class A is Y >= 0.5
Class B is Y < 0.5
Scores
(0,1)
range
Unnormalized log probabilities
Notation changeup:
logit-1 -> sigmoid
Input Data
Two possible outputs!
x
Linear Combination
Logistic Regression
Predictor
Class A is Y >= 0.5
Class B is Y < 0.5
Scores
(0,1)
range
Unnormalized log probabilities
Output of Linear function. AKA
Linear Scores
Linear(x) = xW+b or Wx + b
Textbooks: Wx+b
Tensorflow: xW+b
Computing derivatives is easier for xW + b.
A few notes
f(xi, W, b) = xW + b
Assume image x has all of its pixels flattened out into a single row vector. x =
X’s size is [n x m]. n: # examples/images m: # features (pixels in this case per image)
Matrix W of size [m x k]. m = # features, k = # classes
Bias b of size [k x 1]
Consider our input data (xi, yi) as being fixed. We can set W and b to approximate any function
(remember universality principle).
We use the training data to learn W and b. Once our model has been trained. We can discard the training
data and test our model on test data. Or anything for that matter.
W and b will be tensors if you are using TensorFlow. They can be arrays if you are using Numpy.
x[0] x[1] x[2] x[3] x[4]
Pixel(0, 255)
Example The biases allow us to have
these lines NOT all cross
through the origin
W causes the lines to rotate
about our pixels space
B pushes the lines away
from the origin
src: Andrej Karpathy
Bias Trick (in practice)
It would be annoying to worry about the Bias term separately during classification.
Therefore we simply append the bias row vector to the end of our Weights matrix.
0.1 0.25 0.3
0.63 0.12 -0.64
0.26 0.62 0.58
0.99 -0.14 0.333 0.12 3.1 -0.5
Weights
Bias
0.1 0.25 0.3
0.63 0.12 -0.64
0.26 0.62 0.58
0.99 -0.14 0.333
0.12 3.1 -0.5
Weights
You may see this in the code as: logits = tf.add(tf.matmul(x, weights), bias) OR
logits = tf.matmul(x, weights)
logits = tf.nn.bias_add(bias)
Input image
X
n x m
1 x 4
20 254 40 1img1
1 image, 4 pixels
Each pixel is a feature.
1 image, 4 features
Pixels range (0, 255)
0.1 0.25 0.3
0.63 0.12 -0.64
0.26 0.62 0.58
0.99 -0.14 0.333
Weights
m x k
4 x 3
m: # features (pixels per img) = 4
n: # images = 1
k: # classes = 3 (Cat, Car, Dog)
pretend this image only
has 4 pixels
Bias
1 x k
1 x 3
Linear Scores
stretch pixels into single row
Output
1 x 3
Linear Scores = xW + b
Cat Car Dog
0.12 3.1 -0.5 3.2 5.1 -1.7
values from Andrej Karpathy
Initialize weights with values
b/w 0 and 1. You can
initialize biases to start at 0
or very small values if you
like
Linear Scores, f(x; w, b)
Applying softmax
Apply
exponential
Unnormalized log
probabilities
Unnormalized
probabilities
probabilities
Normalize so
sum = 1
k = # specific class, different from k on the last slide.
J = # classes
Cat Car Dog
3.2 5.1 -1.7 24.5 164 0.18
0.13 0.87 0.00
Cat Car Dog
values from Andrej Karpathy
Input image
Normalized
Probabilities
3 x 1
stretch pixels to single row
X
n x m
1 x 4
20 254 40 1img1 0.1 0.25 0.3
0.63 0.12 -0.64
0.26 0.62 0.58
0.99 -0.14 0.333
Weights
m x k
4 x 3
Bias
1 x k
1 x 3
Linear Scores
Linear
Scores
1 x 3
Cat Car Dog
0.12 3.1 -0.5 3.2 5.1 -1.7
0.13 0.87 0.00
Cat Car Dog
Process so far:
Each pixel can be considered
a neuron
values from Andrej Karpathy
Input image
Normalized
Probabilities
3 x 1
stretch pixels to single row
X
n x m
1 x 4
20 254 40 1img1 0.1 0.25 0.3
0.63 0.12 -0.64
0.26 0.62 0.58
0.99 -0.14 0.333
Weights
m x k
4 x 3
Bias
1 x k
1 x 3
Linear Scores
Linear
Scores
1 x 3
Cat Car Dog
0.12 3.1 -0.5 3.2 5.1 -1.7
0.13 0.87 0.00
Cat Car Dog
Process so far: Forward Pass
Loss Function: How we learn
Recall:
Your output
neurons
didn’t match
the expected
output.
Input image
Normalized
Probabilities
3 x 1
stretch pixels to single row
X
n x m
1 x 4
20 254 40 1img1 0.1 0.25 0.3
0.63 0.12 -0.64
0.26 0.62 0.58
0.99 -0.14 0.333
Weights
m x k
4 x 3
Bias
1 x k
1 x 3
Linear Scores
Linear
Scores
1 x 3
Cat Car Dog
0.12 3.1 -0.5 3.2 5.1 -1.7
0.13 0.87 0.00
Cat Car Dog
Process so far: Forward Pass
Loss Function: How we learn
Normalized
Probabilities
3 x 1
0.13 0.87 0.00
Cat Car Dog Maximize the log likelihood
of true class
OR
Minimize the negative log
likelihood of true class.
(easier to do negative
feedback loop than positive
feedback loop)
values from Andrej Karpathy
Use the loss to manipulate
the weights of the incorrect
classifying inputs.
There are many different types of loss functions.
More of this later
Visualizing a hidden layer
X
Linear:
L1
W1
b1
Linear:
L2
W2
b2
Soft
max
X
4 x 10
W1
10 x 100
b1
100 x 1
xW1
4 x 100
b1
100 x 1
Since b1 has a dimension
with value 1, its values can
be broadcasted among the
xW1 product automatically
X: n x m (examples x features)
W: m x k (features x classes)
b: k x 1 (classes row vector)
L1
4 x 100
L1
W1
100 x 10
b1
10 x 1
L2
4 x 10
L2
We can add a wide layer by
adding columns to W1 and
then add a skinny layer by
giving k columns in W2 so
that our output still has the
desired shape of examples
x classes
These layers are hidden
because we cannot see
their output as we run the
graph
Desired output size:
4 x 10
examples x classes
Neurons are not classes, or objects, they are values.
They are the values that are moving through the pipeline. Follow
a pixel of an example image through a network and consider it to
be a neuron.
Neuron
When we implement a
neural network we use a
graph.
X
Linear:
L1
W1
b1
Softmax
S1
Probabilities
L1 S1
0.13 0.87 0.00
Training Data Images
Labels
Labels tell you the true class
of each image.
Softmax
S1
Sigmoid
S1
Note: These are the same thing
X
Linear:
L1
W1
b1
Sigmoid
S1
Probabilities
(Logits)
L1 S1
0.13 0.87 0.00
Training Data Images
Labels
Labels tell you the true class
of each image.
0 1 0
One-Hot-Training Labels
Cat 0 0 1
Car 0 1 0
Dog 1 0 0
‘Cat’
‘Car’
‘Dog’
Training Labels
Error! should be 0 1 0
If we run our network on just
one image in the training set
and take its corresponding
label
X
Linear:
L1
W1
b1
Sigmoid
S1
logits
L1 S1
Training Data Images
Labels
Run network on all training
data and training labels
One-Hot-Training Labels
0 0 1
0 1 0
1 0 0
‘Cat’
‘Car’
‘Dog’
Training Labels
Cat
Car
Dog
0.13 0.87 0.12
0.55 0.91 0.2
0.88 0.66 0.11
Cat Car Dog
Run network on 3 images
Correct_prediction = tf.equal(tf.argmax(logits, 1),
tf.argmax(labels, 1))
Accuracy = tf.reduce_mean(Correct_prediction, axis =
1)
Find accuracy of our training network. More of this later.
Setting up training data
Training Data Test Data
Training Data Validation Data Test Data
Split up your training data into validation data and
training data. Use validation data as test data as
you train and tune your network.
Train Data: 80% of original training data
Validation Data: 20% of original training data
Then shuffle!
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation =
train_test_split(X_train, y_train, train_size =
0.80, test_size = 0.20)
# this splits train validation to 80 20
X_train, y_train = shuffle(X_train, y_train)
# This shuffles and keeps label indices intact
Preprocessing: Normalization
In our examples we used raw pixel values (0, 255) as our inputs to train our network.
In practice, we preprocess this data before running it through our network.
mean centered normalization: we subtract the mean pixel value from each pixel and divide by the standard deviation.
This allows us to have a relatively gaussian distribution of values from [-1, 1].
(0, 255) => (-1, 1)
Maxmin normalization - where we set subtract from max and divide by the different of (max - min).
This gives us a domain of (0, 1).
(0, 255) => (0, 1)
-1 1
img from Wikipedia
Overfitting - Introduction to Hyperparameters
The goal of building an artificial neural network is to generalize.
● We want to apply new data to our network and classify inputs
● If we overtrain / overfit our network to our training data then our accuracy will be
deceiving. It might work very well for training data, but will not work on test data.
● In order to prevent overfitting we
implement preprocessing
techniques and tune our hyper
parameters.
● Tuning hyperparameters is
basically all that we can do after
we set up our network
architecture
● It should be the last step in
setting up your network
● Test on validation data while you
tune (don’t touch test data)
http://docs.aws.amazon.com/machine-learning/latest/dg/images/mlc
oncepts_image5.png
Epochs
An Epoch is a single forward pass and backward pass of the entire network
It is a hyperparameter and we must tune the number of epochs to fit our data / increase our accuracy
The larger the number of epochs the longer it takes to train
We increase epochs to increase the number of training intervals. If they are increased too large we may overfit
X
Linear:
L1
W1
b1
Softmax
S1
logits
L1 S1
backward
forward
More on the backward pass to come
Stop early!
https://qph.ec.quoracdn.net/main-qimg-d23fbbc85b7d18b4e07b794
2ecdfd856?convert_to_webp=true
Minibatch
We don’t feed in all the neurons into our network at once. Instead, we choose a
batch of neurons and feed them in. Perform forward and backwards propagation
on them, and then feed in the next batch of neurons.
We do this so we can perform Stochastic Gradient Descent, and prevent our
network from overfitting.
So in Mini batch gradient descent, you process a small subset of training data
forwards and backwards and update the weights/ biases with the gradient update
formula (shown on next page)
Mini batch
We feed only segments into our neural network at a time.
Training Data
Batch
Batch
Network
● The amount of neurons
in each batch is a
hyper parameter.
● This also depends on
GPU size
● Typically use 128 or
256
Gradient Descent
The “Learning” in Machine Learning.
Update the values of X (punish) it when it is wrong.
X: weights or biases
η: Learning Rate (typically 0.01 to 0.001)
η :The rate at which our network learns. This can change over time with
methods such as Adam, Adagrad etc. (hyperparameter)
∇(x): Gradient of X
We seek to update the weights and biases by a value indicating how “off”
they were from their target.
Gradients naturally have increasing slope, so we put a negative in front of
it to go downwards
Stochastic Gradient Descent
Recall Gradient Descent: X -= η∇(x) (eq 1)
Stochastic Gradient Descent (SGD) is a version of Gradient Descent where on
each forward pass, a batch of data is randomly sampled from the total dataset and
gradient descent is performed on that batch.
The more batches processed by the network = the better the approximation
1. Randomly sample a batch of data (1) from the total dataset
2. Run the network forward and backward to calculate the gradient from data (1)
3. Apply the gradient descent update (eq 1)
4. Repeat 1-3 until convergence or epoch limit
Visualizing Batch and SGD
256
256
256
232
If we start out with 1000
images and use batch size
of 256 we will have a batch
that has 232 images in it.
Training Images
batch size
Stochastic Gradient Descent sample
size’s.
5 images
Maybe take ~5 images from the 256
batch size at a time and run SGD on
them. Then go back and select 5
more.
X -= η∇(x)
Each X is an image in the
SGD batch
Backpropagation
We need to figure out how to alter a parameter to minimize the cost (loss). First we must find out what
effect that parameter has on the cost.
(we can’t just blindly change parameter values and hope that our network converges)
The gradient notes the effect each parameter has on the cost.
How to determine the effect of a parameter on the cost?
We use Backpropagation - which is an application of the chain rule from calculus
Did somebody
say Chain
Rule?
Backpropagation
Derivative Review:
In order to know the effect f
has on x, we must first find
the effect f has on g, then
the effect g has on x
Backpropagation
You want to stage backpropagation at each gate level locally. This is much easier to implement than by
storing each weight value and trying to compute it at the end. Simply add up the gradients along an
individual neurons path.
Andrej Karpathy
f
X
Y
Z
Change in Loss w respect
to Z
change in Z with
respect to Xchange in L with
respect to Z
X
b1
W1
Linear L1
S1
S1 b/c it goes to sigmoid
S = WX + b
(Loss w respect to X)
More Backpropagation
f
X
Y
Z
Change in Loss w respect
to Z
change in Z with
respect to Xchange in L with
respect to Z
X
b1
W1
Linear L1
S1
S1 b/c it goes to sigmoid
S = WX + b
(Loss w respect to X)
More Backpropagation
This comes together on the next slide!
X
b1
W1
Linear L1
S1
S1 b/c it goes to sigmoid
S = WX + b
(Loss w respect to X)
Sigmoid S1
Any
Gate
Output
X has a relationship to L1, S1 has a relationship
to L1. We can use that relationship in an
application of the chain rule to compute the
change in L1 with respect to X. Then we perform
a gradient descent update on X.
Accumulator of all the gradients up to the L1 gate
(sum of all gradients in red box). aka Accumulated Loss
(Gradient Desc Eqn)
(Update X)
Backpropagation cont
Andrej Karpathy
X
Linear:
L1
W1
b1
Sigmoid
S1
logits
L1 S1
Training Data Images
Labels
Run network on all training
data and training labels
One-Hot-Training Labels
0 0 1
0 1 0
1 0 0
‘Cat’
‘Car’
‘Dog’
Training Labels
Cat
Car
Dog
0.13 0.87 0.12
0.55 0.91 0.2
0.88 0.66 0.11
Cat Car Dog
Run network on 3 images Cross
Entropy
Cross Entropy(distance)
X
Input
2.0
1.0
0.1
Wx+b
y
Logit
Linear
0.7
0.2
0.1
S(Y)
Softmax
S(Y)
1.0
0.0
0.0
L
Labels
D(S,L)
Cross Entropy
Tells us how accurate we are
Minimize cross entropy
● Want a high distance for
incorrect class
● Want a low distance for correct
class
● Training loss = average cross
entropy over the entire
training set.
● Want all the distances to be
small
● want loss to be small
● So we attempt to minimize this
function.
Training Loss
weight 1
weight 2
src: Udacity
Cross Entropy Loss (continued)
weight 1
weight 2
src: Udacity
Want to find the weights to
cause this loss to be the
smallest. Turns M.L
problem into numerical
optimization
weight 1
weight 2
Training Loss
Average cross entropy over
entire training set
Minimize this function
Training Loss
● Take the derivative of Loss with respect to
parameters and follow the derivative by taking a
step backwards.
● Repeat until you get to the bottom.
● In this case we have 2 parameters (w1, w2)
● Typically we have millions of parameters
cross_entropy = -tf.reduce_sum(tf.mul(one_hot, tf.log(softmax)))
Cross Entropy
Loss
Installing Dependencies
You can use pip3 or pip. I recommend using an anaconda environment with python3:
https://www.continuum.io/downloads to Download Anaconda, (get Python 3.4+ version)
conda create --name=IntroToTensorFlow python=3 anaconda
source activate IntroToTensorFlow (Your conda environment is named “IntroToTensorFlow”)
conda install -c anaconda numpy=1.11.3
conda install -c conda-forge matplotlib=2.0.0
conda install -c anaconda scipy=0.18.1
conda install scikit-learn
or pip install -u scikit-learn
conda install -c conda-forge tensorflow
conda install -c menpo opencv3=3.2.0
jupyter notebook (to run in browser)
git clone https://github.com/JonathanCMitchell/TensorFlowLab.git
Installing TensorFlow
Recommended: Python 3.4 or higher and Anaconda
Install TensorFlow
conda create --name=IntroToTensorFlow python=3 anaconda
source activate IntroToTensorFlow
conda install -c conda-forge tensorflow
docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow (Docker if you need it)
# Hello World!
import tensorflow as tf
# create tensorflow object called tensor
hello_constant = tf.constant(‘Hello World!’)
with tf.Session() as sess:
# Run the tf.constant operation in the session
output = sess.run(hello_constant)
print(output)
git clone https://github.com/JonathanCMitchell/TensorFlowLab.git
If you have questions here is my info:
Jonathan Mitchell
github.com/jonathancmitchell
linkedin.com/in/jonathancmitchell
jmitchell1991@gmail.com
Self Driving Cars Los Angeles
https://www.meetup.com/Los-Angeles-Self-Driving-Car-Meetup/
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationJason Anderson
 
Cryptography Baby Step Giant Step
Cryptography Baby Step Giant StepCryptography Baby Step Giant Step
Cryptography Baby Step Giant StepSAUVIK BISWAS
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-MeansAndres Mendez-Vazquez
 
Markov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learningMarkov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learningAndres Mendez-Vazquez
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1aGaston Liberman
 
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...Universitat Politècnica de Catalunya
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417Shuai Zhang
 
Comparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategiesComparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategiesTalha Shaikh
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Palak Sanghani
 

Was ist angesagt? (20)

Variational Autoencoders For Image Generation
Variational Autoencoders For Image GenerationVariational Autoencoders For Image Generation
Variational Autoencoders For Image Generation
 
Cryptography Baby Step Giant Step
Cryptography Baby Step Giant StepCryptography Baby Step Giant Step
Cryptography Baby Step Giant Step
 
26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means26 Machine Learning Unsupervised Fuzzy C-Means
26 Machine Learning Unsupervised Fuzzy C-Means
 
Markov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learningMarkov chain monte_carlo_methods_for_machine_learning
Markov chain monte_carlo_methods_for_machine_learning
 
Fmincon
FminconFmincon
Fmincon
 
algorithm Unit 4
algorithm Unit 4 algorithm Unit 4
algorithm Unit 4
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Distributed ADMM
Distributed ADMMDistributed ADMM
Distributed ADMM
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
Poggi analytics - distance - 1a
Poggi   analytics - distance - 1aPoggi   analytics - distance - 1a
Poggi analytics - distance - 1a
 
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
Comparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategiesComparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategies
 
Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]Lec 9 05_sept [compatibility mode]
Lec 9 05_sept [compatibility mode]
 
algorithm Unit 5
algorithm Unit 5 algorithm Unit 5
algorithm Unit 5
 
algorithm unit 1
algorithm unit 1algorithm unit 1
algorithm unit 1
 

Andere mochten auch

Self Driving Cars V11
Self Driving Cars V11Self Driving Cars V11
Self Driving Cars V11Kevin Root
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlpPan Xiaotong
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analyticsErik Tromp
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionevolutionpd
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill BoormanTextkernel
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesBenjamin Taylor
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text MiningWill Stanton
 
The Self-Driving Car
The Self-Driving CarThe Self-Driving Car
The Self-Driving CarFred Phillips
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Pythonanntp
 
Portfolio Selection with Artificial Neural Networks
Portfolio Selection with Artificial Neural NetworksPortfolio Selection with Artificial Neural Networks
Portfolio Selection with Artificial Neural NetworksAndrew Ashwood
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Aplication of artificial neural network in cancer diagnosis
Aplication of artificial neural network in cancer diagnosisAplication of artificial neural network in cancer diagnosis
Aplication of artificial neural network in cancer diagnosisSaeid Afshar
 
Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...
Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...
Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...Piotr Marek Smolnicki
 
Artificial Neural Network Based Object Recognizing Robot
Artificial Neural Network Based Object Recognizing RobotArtificial Neural Network Based Object Recognizing Robot
Artificial Neural Network Based Object Recognizing RobotJaison Sabu
 

Andere mochten auch (20)

Self Driving Cars V11
Self Driving Cars V11Self Driving Cars V11
Self Driving Cars V11
 
IA Perceptron
IA PerceptronIA Perceptron
IA Perceptron
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
NLP
NLPNLP
NLP
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analytics
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
 
Redes neurais
Redes neuraisRedes neurais
Redes neurais
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill Boorman
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From Resumes
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
The Self-Driving Car
The Self-Driving CarThe Self-Driving Car
The Self-Driving Car
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
Final Ppt
Final PptFinal Ppt
Final Ppt
 
Portfolio Selection with Artificial Neural Networks
Portfolio Selection with Artificial Neural NetworksPortfolio Selection with Artificial Neural Networks
Portfolio Selection with Artificial Neural Networks
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Aplication of artificial neural network in cancer diagnosis
Aplication of artificial neural network in cancer diagnosisAplication of artificial neural network in cancer diagnosis
Aplication of artificial neural network in cancer diagnosis
 
Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...
Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...
Autonomous/Automated Automobiles vs. Self-driving Cars/Vehicles vs. Driverles...
 
Artificial Neural Network Based Object Recognizing Robot
Artificial Neural Network Based Object Recognizing RobotArtificial Neural Network Based Object Recognizing Robot
Artificial Neural Network Based Object Recognizing Robot
 

Ähnlich wie Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved

Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Zihui Li
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...Chode Amarnath
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4Sunwoo Kim
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
Notes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMNotes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMSyedSaimGardezi
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...
Episode 50 :  Simulation Problem Solution Approaches Convergence Techniques S...Episode 50 :  Simulation Problem Solution Approaches Convergence Techniques S...
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...SAJJAD KHUDHUR ABBAS
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer Sammer Qader
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 

Ähnlich wie Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved (20)

Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)Machine Learning Algorithms Review(Part 2)
Machine Learning Algorithms Review(Part 2)
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Notes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMNotes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVM
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...
Episode 50 :  Simulation Problem Solution Approaches Convergence Techniques S...Episode 50 :  Simulation Problem Solution Approaches Convergence Techniques S...
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
SVM.ppt
SVM.pptSVM.ppt
SVM.ppt
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
Presentation on machine learning
Presentation on machine learningPresentation on machine learning
Presentation on machine learning
 
Neural Networks - How do they work?
Neural Networks - How do they work?Neural Networks - How do they work?
Neural Networks - How do they work?
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 

Kürzlich hochgeladen

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 

Kürzlich hochgeladen (20)

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 

Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved

  • 1. Machine Learning pt1: Classification, Regression, and Artificial Neural Networks Self-Driving-Cars Los Angeles By Jonathan Mitchell github.com/jonathancmitchell linkedin.com/in/jonathancmitchell jmitchell1991@gmail.com Self Driving Cars Los Angeles https://www.meetup.com/Los-Angeles-Self-Driving-Car-Meetup/
  • 2. Welcome to Machine Learning aka computational statistics How did I learn this? Sources: ● Udacity’s Self-Driving Car Nanodegree problem (udacity.com/drive) ● MIT Self-Driving Car program (selfdrivingcars.mit.edu) ● Stanford’s cs-231n (cs231n.github.io)
  • 3. Topics A) Probability basics - Basics to Logits B) Linear Classification/ Logistic regression overview C) Perceptron D) Perceptron (biological inspiration) E) Neuron F) Forward Pass G) Computing a loss function H) Visualizing Hidden Layers I) Setting up training data J) Preprocessing / Normalization K) Overfitting / Hyperparameter intro L) Epochs M) Minibatch N) Gradient Descent / Stochastic Gradient Descent O) Backpropagation P) Cross Entropy Loss
  • 4. Probability basics Probability p = all outcomes of interest / all possible outcomes P can range from (0, 1) inclusive. P = 1 = 100% likelihood P = 0 = 0% likelihood Odds: The likelihood of an event P happening: Coin Toss: Toss a coin in the air, it can either be heads or tails. P_heads = 0.5 => P_tails = 0.5 = 1 - P_heads P1 = Heads Probability (0, 1), P0 = Tails Probability (0,1) Odds ratio can be written as Odds1 : Odds2. In this case 1:1. Equal chance of getting Heads or tails P(Not Tails) P(Tails) P(Heads) P(Not Heads) Odds_heads Odds_tails
  • 5. Bernoulli Probability (A specific case of binomial distribution) Bernoulli Probability: A yes or no question. Two possible outcomes: Success and Fail p = probability of success (of one trial) q = probability of fail (of one trial) = 1 - p unknown probability p. N = number of trials = 1 Probability of K successes. K = 1 for Bernoulli A Bernoulli distribution is a special case of a Binomial Distribution with N = 1 trial. Binomial Probability
  • 6. Probability Basics -> Logistic Regression Goal: Estimate an unknown probability p for any given linear combination of the independent variables. Link independent variables to the Ber(p) distribution. Logistic regression: estimate an unknown probability p for any given linear combination of the independent variables. Estimate p = p^ Need function that maps linear combination of variables that can result in any value onto the bernoulli probability distribution with a domain from 0 to 1. Use Logit: Natural log of the odds ratio
  • 7. Logistic Regression Logit: natural log of odds ratio Undefined at P = 0, P = 1 Good P domain (0, 1) Linear Combination graph from http://www.graphpad.com/support/faqid/1465/
  • 8. Logistic Regression Logit: α will be the linear combination of independent variables and their coefficient Recall: Ber(p) = logit(p) - logit(1-p) Inverse logit gives us the probability that dependent var (p) is a “1” Linear Combination Probability of x with linear combination mapping (B and B0) Binary Output variable Y. We want to model the conditional probability Pr(Y = 1 | X = x) as a function of x; any unknown parameters are to be estimated by max likelihood graph from http://www.graphpad.com/support/faqid/1465/
  • 9. Logistic Regression -> Linear Classification To classify we seek a binary output variable Y = 1 or 0. Recall Pr(Y = 1 | X = x). We modeled this as p(x;b,w) Predict Y = 1 when p >= 0.5. Y = 1 = Class A Predict Y = 0 when P < 0.5. Y = 0 = Class B Guess 1 when B + B0 is non-negative Guess 0 when B + B0 is negative This is a linear classifier. We can also infer that the probabilities depend on the distance from the boundary. This is known as a Binary Logistic Classifier (Binary = 2 options, Class A or Class B) The decision boundary separates the two predicted classes and is the solution to this equation Graph from http://pubs.rsc.org/services/images/RSCpubs.ePlatform.Service.FreeContent.ImageServic e.svc/ImageService/Articleimage/2010/AN/b918972f/b918972f-f7.gif
  • 10. Logistic Regression -> Linear Classification To classify we seek a binary output variable Y = 1 or 0. Recall Pr(Y = 1 | X = x). We modeled this as p(x;b,w) Predict Y = 1 when p >= 0.5. Y = 1 = Class A Predict Y = 0 when P < 0.5. Y = 0 = Class B Guess 1 when B + B0 is non-negative Guess 0 when B + B0 is negative This is a linear classifier. We can also infer that the probabilities depend on the distance from the boundary. This is known as a Binary Logistic Classifier (Binary = 2 options, Class A or Class B) The decision boundary separates the two predicted classes and is the solution to this equation Graph from http://pubs.rsc.org/services/images/RSCpubs.ePlatform.Service.FreeContent.ImageServic e.svc/ImageService/Articleimage/2010/AN/b918972f/b918972f-f7.gif
  • 11. Neuron: Building block of a neural network src: MIT-Self-Driving-Cars, Fridman, A Neuron is a computational building block of the brain. Human brain: 1000T synapses 10x that of an Artificial Neuron Artificial Neuron is a computational building block of an artificial neural network. ~1-10B synapses
  • 12. *Takes a set of inputs *Places a weight of each input *sums them together *applies a bias value on each neuron *Uses an activation function that takes in the sum plus bias and squeeze values together into a probability distribution (range 0, 1) Takes a few inputs and places an output Classification: output: 1 or 0 This can serve as a linear classifier src: MIT-Self-Driving-Cars, Fridman,
  • 13. Perceptron Algorithm X1 X3 X2 Output 1. Initialize perceptron with random weights. 2. Compute perceptrons output 3. If output does not match known output a. if output should have been 0 but was 1, decrease the weights that had an input of 1 b. if output should have been 1 but was 0, increase the weights that had an input of 1 4. Move on to next example in the training set until perceptron makes no more mistakes src: MIT-Self-Driving-Cars, Fridman, If output does not match expected output = Punish!
  • 14. Your output neurons didn’t match the expected output. X1 X3 X2 Output Expected Output: Cat but we got Burrito Training Images Perceptron
  • 15. Why Neural Networks are great. X1 X3 X2 Output Perceptron We can use the Hidden Layer to approximate any function Universality: We can closely approximate any function f(x) with a single hidden layer. Driving: Input (sensor data from the world) Output: Drive (use steering data etc) src: MIT-Self-Driving-Cars, Fridman, Lex
  • 16. Dual class Linear Classification with Binary Logistic Regression Input Data Goal: To predict class A or class B from input data. Two possible outputs! x Linear Combination Logistic Regression Predictor Class A is Y >= 0.5 Class B is Y < 0.5 P = 1 P = 0 Squeezes Values between 0 and 1 Scores (0,1) range
  • 17. Notation changeup: logit-1 -> sigmoid Input Data Two possible outputs! x Linear Combination Logistic Regression P = 1 P = 0 Squeezes Values between 0 and 1 puts into probability distribution Predictor Class A is Y >= 0.5 Class B is Y < 0.5 Scores (0,1) range Unnormalized log probabilities
  • 18. Generalizing Logistic Regression to multiple classes If we have two classes we can have two possible outputs: 1 or 0 What if we have 10 classes? Binary - Two Outputs Y either 1 or 0 Supposed we have k classes. Let’s switch up some notation: Now set each score s to the result of that function. Probability that output Y = class K. We have J possible classes. Perform softmax on scores Softmax Classifier is Binary Logistic regression applied to multiple classes Output = scores b/w 0 and 1 Scores
  • 19. Notation changeup: logit-1 -> sigmoid Input Data Two possible outputs! x Linear Combination Logistic Regression P = 1 P = 0 Predictor Class A is Y >= 0.5 Class B is Y < 0.5 Scores (0,1) range Unnormalized log probabilities
  • 20. Notation changeup: logit-1 -> sigmoid Input Data Two possible outputs! x Linear Combination Logistic Regression Predictor Class A is Y >= 0.5 Class B is Y < 0.5 Scores (0,1) range Unnormalized log probabilities Output of Linear function. AKA Linear Scores
  • 21. Linear(x) = xW+b or Wx + b Textbooks: Wx+b Tensorflow: xW+b Computing derivatives is easier for xW + b.
  • 22. A few notes f(xi, W, b) = xW + b Assume image x has all of its pixels flattened out into a single row vector. x = X’s size is [n x m]. n: # examples/images m: # features (pixels in this case per image) Matrix W of size [m x k]. m = # features, k = # classes Bias b of size [k x 1] Consider our input data (xi, yi) as being fixed. We can set W and b to approximate any function (remember universality principle). We use the training data to learn W and b. Once our model has been trained. We can discard the training data and test our model on test data. Or anything for that matter. W and b will be tensors if you are using TensorFlow. They can be arrays if you are using Numpy. x[0] x[1] x[2] x[3] x[4] Pixel(0, 255)
  • 23. Example The biases allow us to have these lines NOT all cross through the origin W causes the lines to rotate about our pixels space B pushes the lines away from the origin src: Andrej Karpathy
  • 24. Bias Trick (in practice) It would be annoying to worry about the Bias term separately during classification. Therefore we simply append the bias row vector to the end of our Weights matrix. 0.1 0.25 0.3 0.63 0.12 -0.64 0.26 0.62 0.58 0.99 -0.14 0.333 0.12 3.1 -0.5 Weights Bias 0.1 0.25 0.3 0.63 0.12 -0.64 0.26 0.62 0.58 0.99 -0.14 0.333 0.12 3.1 -0.5 Weights You may see this in the code as: logits = tf.add(tf.matmul(x, weights), bias) OR logits = tf.matmul(x, weights) logits = tf.nn.bias_add(bias)
  • 25. Input image X n x m 1 x 4 20 254 40 1img1 1 image, 4 pixels Each pixel is a feature. 1 image, 4 features Pixels range (0, 255) 0.1 0.25 0.3 0.63 0.12 -0.64 0.26 0.62 0.58 0.99 -0.14 0.333 Weights m x k 4 x 3 m: # features (pixels per img) = 4 n: # images = 1 k: # classes = 3 (Cat, Car, Dog) pretend this image only has 4 pixels Bias 1 x k 1 x 3 Linear Scores stretch pixels into single row Output 1 x 3 Linear Scores = xW + b Cat Car Dog 0.12 3.1 -0.5 3.2 5.1 -1.7 values from Andrej Karpathy Initialize weights with values b/w 0 and 1. You can initialize biases to start at 0 or very small values if you like
  • 26. Linear Scores, f(x; w, b) Applying softmax Apply exponential Unnormalized log probabilities Unnormalized probabilities probabilities Normalize so sum = 1 k = # specific class, different from k on the last slide. J = # classes Cat Car Dog 3.2 5.1 -1.7 24.5 164 0.18 0.13 0.87 0.00 Cat Car Dog values from Andrej Karpathy
  • 27. Input image Normalized Probabilities 3 x 1 stretch pixels to single row X n x m 1 x 4 20 254 40 1img1 0.1 0.25 0.3 0.63 0.12 -0.64 0.26 0.62 0.58 0.99 -0.14 0.333 Weights m x k 4 x 3 Bias 1 x k 1 x 3 Linear Scores Linear Scores 1 x 3 Cat Car Dog 0.12 3.1 -0.5 3.2 5.1 -1.7 0.13 0.87 0.00 Cat Car Dog Process so far: Each pixel can be considered a neuron values from Andrej Karpathy
  • 28. Input image Normalized Probabilities 3 x 1 stretch pixels to single row X n x m 1 x 4 20 254 40 1img1 0.1 0.25 0.3 0.63 0.12 -0.64 0.26 0.62 0.58 0.99 -0.14 0.333 Weights m x k 4 x 3 Bias 1 x k 1 x 3 Linear Scores Linear Scores 1 x 3 Cat Car Dog 0.12 3.1 -0.5 3.2 5.1 -1.7 0.13 0.87 0.00 Cat Car Dog Process so far: Forward Pass
  • 29. Loss Function: How we learn Recall: Your output neurons didn’t match the expected output.
  • 30. Input image Normalized Probabilities 3 x 1 stretch pixels to single row X n x m 1 x 4 20 254 40 1img1 0.1 0.25 0.3 0.63 0.12 -0.64 0.26 0.62 0.58 0.99 -0.14 0.333 Weights m x k 4 x 3 Bias 1 x k 1 x 3 Linear Scores Linear Scores 1 x 3 Cat Car Dog 0.12 3.1 -0.5 3.2 5.1 -1.7 0.13 0.87 0.00 Cat Car Dog Process so far: Forward Pass
  • 31. Loss Function: How we learn Normalized Probabilities 3 x 1 0.13 0.87 0.00 Cat Car Dog Maximize the log likelihood of true class OR Minimize the negative log likelihood of true class. (easier to do negative feedback loop than positive feedback loop) values from Andrej Karpathy Use the loss to manipulate the weights of the incorrect classifying inputs. There are many different types of loss functions. More of this later
  • 32. Visualizing a hidden layer X Linear: L1 W1 b1 Linear: L2 W2 b2 Soft max X 4 x 10 W1 10 x 100 b1 100 x 1 xW1 4 x 100 b1 100 x 1 Since b1 has a dimension with value 1, its values can be broadcasted among the xW1 product automatically X: n x m (examples x features) W: m x k (features x classes) b: k x 1 (classes row vector) L1 4 x 100 L1 W1 100 x 10 b1 10 x 1 L2 4 x 10 L2 We can add a wide layer by adding columns to W1 and then add a skinny layer by giving k columns in W2 so that our output still has the desired shape of examples x classes These layers are hidden because we cannot see their output as we run the graph Desired output size: 4 x 10 examples x classes
  • 33. Neurons are not classes, or objects, they are values. They are the values that are moving through the pipeline. Follow a pixel of an example image through a network and consider it to be a neuron. Neuron When we implement a neural network we use a graph.
  • 34. X Linear: L1 W1 b1 Softmax S1 Probabilities L1 S1 0.13 0.87 0.00 Training Data Images Labels Labels tell you the true class of each image. Softmax S1 Sigmoid S1 Note: These are the same thing
  • 35. X Linear: L1 W1 b1 Sigmoid S1 Probabilities (Logits) L1 S1 0.13 0.87 0.00 Training Data Images Labels Labels tell you the true class of each image. 0 1 0 One-Hot-Training Labels Cat 0 0 1 Car 0 1 0 Dog 1 0 0 ‘Cat’ ‘Car’ ‘Dog’ Training Labels Error! should be 0 1 0 If we run our network on just one image in the training set and take its corresponding label
  • 36. X Linear: L1 W1 b1 Sigmoid S1 logits L1 S1 Training Data Images Labels Run network on all training data and training labels One-Hot-Training Labels 0 0 1 0 1 0 1 0 0 ‘Cat’ ‘Car’ ‘Dog’ Training Labels Cat Car Dog 0.13 0.87 0.12 0.55 0.91 0.2 0.88 0.66 0.11 Cat Car Dog Run network on 3 images Correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1)) Accuracy = tf.reduce_mean(Correct_prediction, axis = 1) Find accuracy of our training network. More of this later.
  • 37. Setting up training data Training Data Test Data Training Data Validation Data Test Data Split up your training data into validation data and training data. Use validation data as test data as you train and tune your network. Train Data: 80% of original training data Validation Data: 20% of original training data Then shuffle! from sklearn.utils import shuffle from sklearn.model_selection import train_test_split X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, train_size = 0.80, test_size = 0.20) # this splits train validation to 80 20 X_train, y_train = shuffle(X_train, y_train) # This shuffles and keeps label indices intact
  • 38. Preprocessing: Normalization In our examples we used raw pixel values (0, 255) as our inputs to train our network. In practice, we preprocess this data before running it through our network. mean centered normalization: we subtract the mean pixel value from each pixel and divide by the standard deviation. This allows us to have a relatively gaussian distribution of values from [-1, 1]. (0, 255) => (-1, 1) Maxmin normalization - where we set subtract from max and divide by the different of (max - min). This gives us a domain of (0, 1). (0, 255) => (0, 1) -1 1 img from Wikipedia
  • 39. Overfitting - Introduction to Hyperparameters The goal of building an artificial neural network is to generalize. ● We want to apply new data to our network and classify inputs ● If we overtrain / overfit our network to our training data then our accuracy will be deceiving. It might work very well for training data, but will not work on test data. ● In order to prevent overfitting we implement preprocessing techniques and tune our hyper parameters. ● Tuning hyperparameters is basically all that we can do after we set up our network architecture ● It should be the last step in setting up your network ● Test on validation data while you tune (don’t touch test data) http://docs.aws.amazon.com/machine-learning/latest/dg/images/mlc oncepts_image5.png
  • 40. Epochs An Epoch is a single forward pass and backward pass of the entire network It is a hyperparameter and we must tune the number of epochs to fit our data / increase our accuracy The larger the number of epochs the longer it takes to train We increase epochs to increase the number of training intervals. If they are increased too large we may overfit X Linear: L1 W1 b1 Softmax S1 logits L1 S1 backward forward More on the backward pass to come Stop early! https://qph.ec.quoracdn.net/main-qimg-d23fbbc85b7d18b4e07b794 2ecdfd856?convert_to_webp=true
  • 41. Minibatch We don’t feed in all the neurons into our network at once. Instead, we choose a batch of neurons and feed them in. Perform forward and backwards propagation on them, and then feed in the next batch of neurons. We do this so we can perform Stochastic Gradient Descent, and prevent our network from overfitting. So in Mini batch gradient descent, you process a small subset of training data forwards and backwards and update the weights/ biases with the gradient update formula (shown on next page)
  • 42. Mini batch We feed only segments into our neural network at a time. Training Data Batch Batch Network ● The amount of neurons in each batch is a hyper parameter. ● This also depends on GPU size ● Typically use 128 or 256
  • 43. Gradient Descent The “Learning” in Machine Learning. Update the values of X (punish) it when it is wrong. X: weights or biases η: Learning Rate (typically 0.01 to 0.001) η :The rate at which our network learns. This can change over time with methods such as Adam, Adagrad etc. (hyperparameter) ∇(x): Gradient of X We seek to update the weights and biases by a value indicating how “off” they were from their target. Gradients naturally have increasing slope, so we put a negative in front of it to go downwards
  • 44. Stochastic Gradient Descent Recall Gradient Descent: X -= η∇(x) (eq 1) Stochastic Gradient Descent (SGD) is a version of Gradient Descent where on each forward pass, a batch of data is randomly sampled from the total dataset and gradient descent is performed on that batch. The more batches processed by the network = the better the approximation 1. Randomly sample a batch of data (1) from the total dataset 2. Run the network forward and backward to calculate the gradient from data (1) 3. Apply the gradient descent update (eq 1) 4. Repeat 1-3 until convergence or epoch limit
  • 45. Visualizing Batch and SGD 256 256 256 232 If we start out with 1000 images and use batch size of 256 we will have a batch that has 232 images in it. Training Images batch size Stochastic Gradient Descent sample size’s. 5 images Maybe take ~5 images from the 256 batch size at a time and run SGD on them. Then go back and select 5 more. X -= η∇(x) Each X is an image in the SGD batch
  • 46. Backpropagation We need to figure out how to alter a parameter to minimize the cost (loss). First we must find out what effect that parameter has on the cost. (we can’t just blindly change parameter values and hope that our network converges) The gradient notes the effect each parameter has on the cost. How to determine the effect of a parameter on the cost? We use Backpropagation - which is an application of the chain rule from calculus Did somebody say Chain Rule?
  • 47. Backpropagation Derivative Review: In order to know the effect f has on x, we must first find the effect f has on g, then the effect g has on x
  • 48. Backpropagation You want to stage backpropagation at each gate level locally. This is much easier to implement than by storing each weight value and trying to compute it at the end. Simply add up the gradients along an individual neurons path. Andrej Karpathy
  • 49. f X Y Z Change in Loss w respect to Z change in Z with respect to Xchange in L with respect to Z X b1 W1 Linear L1 S1 S1 b/c it goes to sigmoid S = WX + b (Loss w respect to X) More Backpropagation
  • 50. f X Y Z Change in Loss w respect to Z change in Z with respect to Xchange in L with respect to Z X b1 W1 Linear L1 S1 S1 b/c it goes to sigmoid S = WX + b (Loss w respect to X) More Backpropagation This comes together on the next slide!
  • 51. X b1 W1 Linear L1 S1 S1 b/c it goes to sigmoid S = WX + b (Loss w respect to X) Sigmoid S1 Any Gate Output X has a relationship to L1, S1 has a relationship to L1. We can use that relationship in an application of the chain rule to compute the change in L1 with respect to X. Then we perform a gradient descent update on X. Accumulator of all the gradients up to the L1 gate (sum of all gradients in red box). aka Accumulated Loss (Gradient Desc Eqn) (Update X)
  • 53. X Linear: L1 W1 b1 Sigmoid S1 logits L1 S1 Training Data Images Labels Run network on all training data and training labels One-Hot-Training Labels 0 0 1 0 1 0 1 0 0 ‘Cat’ ‘Car’ ‘Dog’ Training Labels Cat Car Dog 0.13 0.87 0.12 0.55 0.91 0.2 0.88 0.66 0.11 Cat Car Dog Run network on 3 images Cross Entropy
  • 54. Cross Entropy(distance) X Input 2.0 1.0 0.1 Wx+b y Logit Linear 0.7 0.2 0.1 S(Y) Softmax S(Y) 1.0 0.0 0.0 L Labels D(S,L) Cross Entropy Tells us how accurate we are Minimize cross entropy ● Want a high distance for incorrect class ● Want a low distance for correct class ● Training loss = average cross entropy over the entire training set. ● Want all the distances to be small ● want loss to be small ● So we attempt to minimize this function. Training Loss weight 1 weight 2 src: Udacity
  • 55. Cross Entropy Loss (continued) weight 1 weight 2 src: Udacity Want to find the weights to cause this loss to be the smallest. Turns M.L problem into numerical optimization weight 1 weight 2 Training Loss Average cross entropy over entire training set Minimize this function Training Loss ● Take the derivative of Loss with respect to parameters and follow the derivative by taking a step backwards. ● Repeat until you get to the bottom. ● In this case we have 2 parameters (w1, w2) ● Typically we have millions of parameters cross_entropy = -tf.reduce_sum(tf.mul(one_hot, tf.log(softmax))) Cross Entropy Loss
  • 56. Installing Dependencies You can use pip3 or pip. I recommend using an anaconda environment with python3: https://www.continuum.io/downloads to Download Anaconda, (get Python 3.4+ version) conda create --name=IntroToTensorFlow python=3 anaconda source activate IntroToTensorFlow (Your conda environment is named “IntroToTensorFlow”) conda install -c anaconda numpy=1.11.3 conda install -c conda-forge matplotlib=2.0.0 conda install -c anaconda scipy=0.18.1 conda install scikit-learn or pip install -u scikit-learn conda install -c conda-forge tensorflow conda install -c menpo opencv3=3.2.0 jupyter notebook (to run in browser) git clone https://github.com/JonathanCMitchell/TensorFlowLab.git
  • 57. Installing TensorFlow Recommended: Python 3.4 or higher and Anaconda Install TensorFlow conda create --name=IntroToTensorFlow python=3 anaconda source activate IntroToTensorFlow conda install -c conda-forge tensorflow docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow (Docker if you need it) # Hello World! import tensorflow as tf # create tensorflow object called tensor hello_constant = tf.constant(‘Hello World!’) with tf.Session() as sess: # Run the tf.constant operation in the session output = sess.run(hello_constant) print(output) git clone https://github.com/JonathanCMitchell/TensorFlowLab.git
  • 58. If you have questions here is my info: Jonathan Mitchell github.com/jonathancmitchell linkedin.com/in/jonathancmitchell jmitchell1991@gmail.com Self Driving Cars Los Angeles https://www.meetup.com/Los-Angeles-Self-Driving-Car-Meetup/ Thank you!