Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Introduction to Machine
Learning
(Neural networks)
Dmytro Fishman (dmytro@ut.ee)
The following material was adopted from:
Evolution of ML methods
Rule-based
Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_moti...
Evolution of ML methods
Rule-based
Classic Machine
Learning
Adopted from Y.Bengio http://videolectures.net/deeplearning201...
Evolution of ML methods
Rule-based
Classic Machine
Learning
Representation
Learning
Adopted from Y.Bengio http://videolect...
Evolution of ML methods
Rule-based
Classic Machine
Learning
Representation
Learning
Deep Learning
Adopted from Y.Bengio ht...
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Spa...
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Spa...
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Spa...
What is deep learning?
Many layers of adaptive non-linear processing to
model complex relationships among data
Space 1 Spa...
In practice
DL = Artificial Neural Networks with many layers
A Logical Calculus of the Ideas
Immanent in Nervous Activity
McCulloch & Pitts (1943)
Rosenblatt (1957)
Perceptron
NewYorkTimes: “(The
perceptron) is the embryo of
an electronic computer that is
expected to b...
Minsky & Papert (1969)
Perceptrons: an introduction to
computational geometry
Rosenblatt (1957)
Perceptron
A Logical Calcu...
Blum & Rivest (1992)
Training a 3-node neural network is NP-
complete
Minsky & Papert (1969)
Perceptrons: an introduction ...
Rumelhart, Hinton & Williams (1986)
Learning representations by back-
propagating errors
Blum & Rivest (1992)
Training a 3...
Artificial neural network
• A collection of simple trainable mathematical units, which
collaborate to compute a complicated...
x2w2
x1w1
x2w2
w0
X
0
x1w1
x0
Artificial Neuron
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x1w1
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x1w1
x0w0
x2w2
x1w1
x2w2
w0
X
0
x1w1
x2w2
x0
Artificial Neuron
x1w1
x0w0
X
i 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
i 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
i 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
x
x2w2
x1w1
x0w0 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
x
x2w2
x1w1
x0w0 0
x2w2
x1w1
x0
Input layer Fully connected
layer
Output layer
Single
neuron
Feedforward Neural Network
X
i 0
Learning algorithm
• modify connection weights to
make prediction closer to y
• run neuronal network on input x
• while no...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
neth1 = w1 ⇤ i1 + w2 ⇤...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
1.The Forward pass - Compute total error
INPUT TARGET
We have o...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
We want to ...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
Etotal =
X ...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
Etotal =
X ...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
2.The Backward pass - Updating weights
INPUT TARGET
@Etotal
@w5...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1,...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1,...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1,...
Lets backpropagate
i1 = 0.05
i2 = 0.10
o1 = 0.01
o2 = 0.99
• Repeat for w6, w7, w8
INPUT TARGET
• In analogous way for w1,...
http://www.emergentmind.com/neural-network
Training Neural Networks
http://playground.tensorflow.org/
Training Neural Networks
(part II)
Deep networks were
difficult to train
Overfitting
DimensionalityVanishing gradients
Complex landscape
w2
w1
E(w1,w2)
Why DL revolution did not
happen in 1986?
Not enough data
(datasets 1000 too small)
Why DL revolution did not
happen in 1986?
Computers were too slow
(1000000 times)
Why DL revolution did not
happen in 1986?
Not enough data
(datasets 1000 too small)
1.2 million images
1000 categories
Errors
2010
2011
28%
26%
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
Errors
2010
2011
2012
28%
26%
16%
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-...
Errors
2010
2011
2012
2013
2014
2015
2016
28%
26%
16%
12%
7%
3% <3%
AlexNet (A. Krizhevsky et al. 2012)
http://karpathy.gi...
Errors
2010
2011
2012
2013
2014
2015
2016
28%
26%
16%
12%
7%
3% <3%
Hypothetical super-
dedicated fine-
grained expert
ense...
Convolutional Neural Network
Let’s consider the following image
Convolutional Neural Network
Let’s consider the following image
Convolutional Neural Network
Convolutional layer works as a
filter applied to the original image
Convolutional Neural Network
Convolutional layer works as a
filter applied to the original image
There are many filters in t...
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one ou...
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one ou...
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one ou...
Convolutional Neural Network
4 filters
Each filter applied to all possible
2x2 patches of the original image
produces one ou...
Convolutional Neural Network
Each filter applied to all possible
2x2 patches of the original image
produces one output valu...
Convolutional Neural Network
Each filter applied to all possible
2x2 patches of the original image
produces one output valu...
Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convoluti...
Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convoluti...
Flattening
The output of the last
convolutional layer is flattened into
a single vector (like we did with
images)
Convoluti...
Training Neural Networks
(part III)
http://scs.ryerson.ca/~aharley/vis/conv/
http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png
• Pre-training (weights initialization)
(complex landscape)
• Pre-training (weights initialization)
• Efficient descent algorithms
(complex landscape)
(complex landscape)
• Pre-training (weights initialization)
• Efficient descent algorithms
• Activation
(complex landscape)
(complex landscape)...
• Pre-training (weights initialization)
• Efficient descent algorithms
• Dropout
• Activation
(complex landscape)
(complex ...
• Pre-training (weights initialization)
• Efficient descent algorithms
• Dropout
• Domain Prior Knowledge
• Activation
(com...
Now that we are deep...
• Powerful function approximation
• Instead of hand-crafted features, let the algorithm
build the ...
False positives
False negatives
Karpathy, Fei-Fei,“DeepVisual-Semantic Alignments for Generating Image Descriptions” (2014)
Style transferring
Texture Networks by Dmitry Ulyanov et al.
Visual and Textual Question Answering
Visual and Textual Question Answering
Visual and Textual Question Answering
http://cloudcv.org/vqa/
References
• Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine-
learning)
• Introduction to Machine Le...
www.biit.cs.ut.ee www.ut.ee www.quretec.ee
5 Introduction to neural networks
5 Introduction to neural networks
5 Introduction to neural networks
5 Introduction to neural networks
5 Introduction to neural networks
5 Introduction to neural networks
Nächste SlideShare
Wird geladen in …5
×

5 Introduction to neural networks

The fifth lecture from the Machine Learning course series of lectures. It covers short history, basic types and most important principles of neural networks. A link to my github (https://github.com/skyfallen/MachineLearningPracticals) with practicals that I have designed for this course in both R and Python. I can share keynote files, contact me via e-mail: dmytro.fishman@ut.ee.

  • Als Erste(r) kommentieren

5 Introduction to neural networks

  1. 1. Introduction to Machine Learning (Neural networks) Dmytro Fishman (dmytro@ut.ee)
  2. 2. The following material was adopted from:
  3. 3. Evolution of ML methods Rule-based Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Fixed set of rules Output
  4. 4. Evolution of ML methods Rule-based Classic Machine Learning Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Input Fixed set of rules Output Hand designed features Learning Output
  5. 5. Evolution of ML methods Rule-based Classic Machine Learning Representation Learning Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Input Input Fixed set of rules Output Hand designed features Learning Automated feature extraction Output Learning Output
  6. 6. Evolution of ML methods Rule-based Classic Machine Learning Representation Learning Deep Learning Adopted from Y.Bengio http://videolectures.net/deeplearning2015_bengio_theoretical_motivations/ Input Input Input Input Fixed set of rules Output Hand designed features Learning Automated feature extraction Output Learning Output Low level features High level features Learning Output
  7. 7. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2
  8. 8. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 155 255 255 255 155 0 255 255 255 255 255 255 255 255 155 78 78 155 255 255 255 0 0 0 0 155 255 0,1,2,3,…9
  9. 9. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2 species
  10. 10. What is deep learning? Many layers of adaptive non-linear processing to model complex relationships among data Space 1 Space 2 “We love you”
  11. 11. In practice DL = Artificial Neural Networks with many layers
  12. 12. A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  13. 13. Rosenblatt (1957) Perceptron NewYorkTimes: “(The perceptron) is the embryo of an electronic computer that is expected to be able to walk, talk, see, write, reproduce itself and be conscious of its existence” A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  14. 14. Minsky & Papert (1969) Perceptrons: an introduction to computational geometry Rosenblatt (1957) Perceptron A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  15. 15. Blum & Rivest (1992) Training a 3-node neural network is NP- complete Minsky & Papert (1969) Perceptrons: an introduction to computational geometry Rosenblatt (1957) Perceptron A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  16. 16. Rumelhart, Hinton & Williams (1986) Learning representations by back- propagating errors Blum & Rivest (1992) Training a 3-node neural network is NP- complete Minsky & Papert (1969) Perceptrons: an introduction to computational geometry Rosenblatt (1957) Perceptron A Logical Calculus of the Ideas Immanent in Nervous Activity McCulloch & Pitts (1943)
  17. 17. Artificial neural network • A collection of simple trainable mathematical units, which collaborate to compute a complicated function • Compatible with supervised, unsupervised, and reinforcement • Brain inspired (loosely)
  18. 18. x2w2 x1w1 x2w2 w0 X 0 x1w1 x0 Artificial Neuron
  19. 19. x2w2 x1w1 x2w2 w0 X 0 x1w1 x2w2 x0 Artificial Neuron
  20. 20. x2w2 x1w1 x2w2 w0 X 0 x1w1 x2w2 x0 Artificial Neuron x1w1
  21. 21. x2w2 x1w1 x2w2 w0 X 0 x1w1 x2w2 x0 Artificial Neuron x1w1 x0w0
  22. 22. x2w2 x1w1 x2w2 w0 X 0 x1w1 x2w2 x0 Artificial Neuron x1w1 x0w0
  23. 23. X i 0 x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network
  24. 24. X i 0 x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network
  25. 25. X i 0 x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network
  26. 26. x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network X x x2w2 x1w1 x0w0 0
  27. 27. x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network X x x2w2 x1w1 x0w0 0
  28. 28. x2w2 x1w1 x0 Input layer Fully connected layer Output layer Single neuron Feedforward Neural Network X i 0
  29. 29. Learning algorithm • modify connection weights to make prediction closer to y • run neuronal network on input x • while not done • pick a random training instance (x, y)
  30. 30. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 INPUT TARGET
  31. 31. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775 INPUT TARGET
  32. 32. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775 INPUT TARGET f(x) = 1 1 + e x outh1 = 1 1 + e neth1 = 1 1 + e 0.3775 = 0.5933
  33. 33. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error neth1 = w1 ⇤ i1 + w2 ⇤ i2 + b1 ⇤ 1 neth1 = 0.15 ⇤ 0.05 + 0.2 ⇤ 0.1 + 0.35 ⇤ 1 = 0.3775 INPUT TARGET f(x) = 1 1 + e x outh1 = 1 1 + e neth1 = 1 1 + e 0.3775 = 0.5933 Repeat for h2 = 0.596; o1 = 0.751; o2 = 0.773
  34. 34. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2
  35. 35. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2
  36. 36. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2 Eo1 = 1 2 (targeto1 outo1) 2 = 1 2 (0.01 0.7514) 2 = 0.2748
  37. 37. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2 Eo1 = 1 2 (targeto1 outo1) 2 = 1 2 (0.01 0.7514) 2 = 0.2748 Eo2 = 0.02356
  38. 38. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 1.The Forward pass - Compute total error INPUT TARGET We have o1, o2 Etotal = X 1 2 (target output) 2 Eo1 = 1 2 (targeto1 outo1) 2 = 1 2 (0.01 0.7514) 2 = 0.2748 Eo2 = 0.02356 Etotal = Eo1 + Eo2 = 0.2748 + 0.02356 = 0.29836
  39. 39. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET We want to know how much a change in w5 affects the total error
  40. 40. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  41. 41. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  42. 42. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET Etotal = X 1 2 (target output) 2 @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  43. 43. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET Etotal = X 1 2 (target output) 2 @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @Etotal @outo1 = 2 ⇤ 1 2 (targeto1 outo1) ⇤ 1 + 0 = (0.01 0.751) = 0.741
  44. 44. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  45. 45. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error outo1 = 1 1 + e neto1
  46. 46. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @outo1 @neto1 = outo1 (1 outo1) = 0.1868 outo1 = 1 1 + e neto1
  47. 47. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  48. 48. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1
  49. 49. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error neto1 = w5 ⇤ outh1 + w6 ⇤ outh2 + b2 ⇤ 1 @neto1 @w5 = outh1 = 0.5933
  50. 50. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error
  51. 51. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @Etotal @w5 = 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821
  52. 52. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 2.The Backward pass - Updating weights INPUT TARGET @Etotal @w5 = @Etotal @outo1 ⇤ @outo1 @neto1 ⇤ @neto1 @w5 We want to know how much a change in w5 affects the total error @Etotal @w5 = 0.7414 ⇤ 0.1868 ⇤ 0.5933 = 0.0821 wnew 5 = wold 5 ⌘ ⇤ @Etotal @w5 = 0.4 0.5 ⇤ 0.0821 = 0.3589
  53. 53. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET
  54. 54. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4
  55. 55. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4 • Compute the total error before: 0.298371109
  56. 56. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4 • Compute the total error before: now: 0.298371109 0.291027924
  57. 57. Lets backpropagate i1 = 0.05 i2 = 0.10 o1 = 0.01 o2 = 0.99 • Repeat for w6, w7, w8 INPUT TARGET • In analogous way for w1, w2, w3, w4 • Compute the total error before: now: 0.298371109 0.291027924 • Repeat x10000: 0.000035085
  58. 58. http://www.emergentmind.com/neural-network Training Neural Networks
  59. 59. http://playground.tensorflow.org/ Training Neural Networks (part II)
  60. 60. Deep networks were difficult to train Overfitting DimensionalityVanishing gradients Complex landscape w2 w1 E(w1,w2)
  61. 61. Why DL revolution did not happen in 1986?
  62. 62. Not enough data (datasets 1000 too small) Why DL revolution did not happen in 1986?
  63. 63. Computers were too slow (1000000 times) Why DL revolution did not happen in 1986? Not enough data (datasets 1000 too small)
  64. 64. 1.2 million images 1000 categories
  65. 65. Errors 2010 2011 28% 26% http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
  66. 66. Errors 2010 2011 2012 28% 26% 16% http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/ AlexNet (A. Krizhevsky et al. 2012)
  67. 67. Errors 2010 2011 2012 2013 2014 2015 2016 28% 26% 16% 12% 7% 3% <3% AlexNet (A. Krizhevsky et al. 2012) http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
  68. 68. Errors 2010 2011 2012 2013 2014 2015 2016 28% 26% 16% 12% 7% 3% <3% Hypothetical super- dedicated fine- grained expert ensemble of human labelers AlexNet (A. Krizhevsky et al. 2012) http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
  69. 69. Convolutional Neural Network Let’s consider the following image
  70. 70. Convolutional Neural Network Let’s consider the following image
  71. 71. Convolutional Neural Network Convolutional layer works as a filter applied to the original image
  72. 72. Convolutional Neural Network Convolutional layer works as a filter applied to the original image There are many filters in the convolutional layer, they detect different patterns 4 filters
  73. 73. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  74. 74. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  75. 75. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  76. 76. Convolutional Neural Network 4 filters Each filter applied to all possible 2x2 patches of the original image produces one output value
  77. 77. Convolutional Neural Network Each filter applied to all possible 2x2 patches of the original image produces one output value Repeat this process for all filters in this layer
  78. 78. Convolutional Neural Network Each filter applied to all possible 2x2 patches of the original image produces one output value Repeat this process for all filters in this layer and the next
  79. 79. Flattening The output of the last convolutional layer is flattened into a single vector (like we did with images) Convolutional Neural Network
  80. 80. Flattening The output of the last convolutional layer is flattened into a single vector (like we did with images) Convolutional Neural Network 0 1 2 7 8 9 This vector is fed into fully connected layer with as many neutrons as possible classes
  81. 81. Flattening The output of the last convolutional layer is flattened into a single vector (like we did with images) Convolutional Neural Network 0 1 2 8 9 This vector is fed into fully connected layer with as many neutrons as possible classes Each neuron outputs probabilities 7
  82. 82. Training Neural Networks (part III) http://scs.ryerson.ca/~aharley/vis/conv/
  83. 83. http://www.asimovinstitute.org/wp-content/uploads/2016/09/neuralnetworks.png
  84. 84. • Pre-training (weights initialization) (complex landscape)
  85. 85. • Pre-training (weights initialization) • Efficient descent algorithms (complex landscape) (complex landscape)
  86. 86. • Pre-training (weights initialization) • Efficient descent algorithms • Activation (complex landscape) (complex landscape) (vanishing gradient)
  87. 87. • Pre-training (weights initialization) • Efficient descent algorithms • Dropout • Activation (complex landscape) (complex landscape) (vanishing gradient) (overfitting)
  88. 88. • Pre-training (weights initialization) • Efficient descent algorithms • Dropout • Domain Prior Knowledge • Activation (complex landscape) (complex landscape) (vanishing gradient) (overfitting)
  89. 89. Now that we are deep... • Powerful function approximation • Instead of hand-crafted features, let the algorithm build the relevant features for your problem • More representational power for learning
  90. 90. False positives
  91. 91. False negatives
  92. 92. Karpathy, Fei-Fei,“DeepVisual-Semantic Alignments for Generating Image Descriptions” (2014)
  93. 93. Style transferring Texture Networks by Dmitry Ulyanov et al.
  94. 94. Visual and Textual Question Answering
  95. 95. Visual and Textual Question Answering
  96. 96. Visual and Textual Question Answering http://cloudcv.org/vqa/
  97. 97. References • Machine Learning by Andrew Ng (https://www.coursera.org/learn/machine- learning) • Introduction to Machine Learning by Pascal Vincent given at Deep Learning Summer School, Montreal 2015 (http://videolectures.net/ deeplearning2015_vincent_machine_learning/) • Welcome to Machine Learning by Konstantin Tretyakov delivered at AACIMP Summer School 2015 (http://kt.era.ee/lectures/aacimp2015/1-intro.pdf) • Stanford CS class: Convolutional Neural Networks for Visual Recognition by Andrej Karpathy (http://cs231n.github.io/) • Data Mining Course by Jaak Vilo at University of Tartu (https://courses.cs.ut.ee/ MTAT.03.183/2017_spring/uploads/Main/DM_05_Clustering.pdf) • Machine Learning Essential Conepts by Ilya Kuzovkin (https:// www.slideshare.net/iljakuzovkin) • From the brain to deep learning and back by Raul Vicente Zafra and Ilya Kuzovkin (http://www.uttv.ee/naita?id=23585&keel=eng)
  98. 98. www.biit.cs.ut.ee www.ut.ee www.quretec.ee

×