Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Introduction to Deep Learning

Lecture provides a comprehensive introduction to basic deep learning concepts, including feed-forward path, backpropagation algorithm, activation function and vanishing gradients.

  • Als Erste(r) kommentieren

Introduction to Deep Learning

  1. 1. Dmytro Fishman (dmytrofishman@gmail.com) Intro to Deep Learning: 100% Tartu: 100%
  2. 2. BioInformatics and Information Technology research group https://biit.cs.ut.ee/
  3. 3. Scientific Software Statistical Analysis Biological Imaging OMICS Teaching & Training Personalised Medicine Data management Machine Learning
  4. 4. Biological Imaging
  5. 5. Biological Imaging 1 2 3 4 5 6 7 Microscopy imaging
  6. 6. Biological Imaging 1 2 3 4 5 6 7 Microscopy imaging Retinopathy detection Classification of skin cancer
  7. 7. Slides are @ https://bit.ly/2LgrKRd
  8. 8. Machine Learning
  9. 9. Machine Learning
  10. 10. Supervised Learning Machine Learning
  11. 11. Machine Learning Supervised Learning Unsupervised Learning
  12. 12. Machine Learning Reinforcement Learning Supervised Learning Unsupervised Learning
  13. 13. Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning
  14. 14. Supervised Learning Machine Learning Unsupervised Learning Reinforcement Learning
  15. 15. Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning
  16. 16. Machine Learning Deep Learning Supervised Learning Unsupervised Learning Reinforcement Learning
  17. 17. What is deep learning?
  18. 18. What is deep learning? An adaptive non-linear mapping from one space to another Space 1 Space 2
  19. 19. What is deep learning? Space 1 Space 2 3 An adaptive non-linear mapping from one space to another
  20. 20. What is deep learning? Space 1 Space 2 SETOSA An adaptive non-linear mapping from one space to another
  21. 21. What is deep learning? Space 1 Space 2 “Hello world!” An adaptive non-linear mapping from one space to another
  22. 22. In practice DL = Artificial Neural Networks with many layers
  23. 23. 4 pixel image 4 pixel image
  24. 24. 4 pixel image Categorise images
  25. 25. 4 pixel image Categorise images Solid
  26. 26. 4 pixel image Categorise images Solid Vertical
  27. 27. 4 pixel image Categorise images Solid Vertical Diagonal
  28. 28. 4 pixel image Categorise images Solid Vertical Diagonal Horizontal
  29. 29. 4 pixel image Simple rules won’t do the trick Solid Vertical Diagonal Horizontal
  30. 30. 4 pixel image
  31. 31. 4 pixel image
  32. 32. 4 pixel image -0.5-1.0 0 1.00.5 Scale of values
  33. 33. 4 pixel image -0.5 0 1.0 -1.0
  34. 34. 4 pixel image -0.5 0 1.0 -1.0 Receptive field of neuron
  35. 35. 4 pixel image -0.5 0 1.0 -1.0 Input layer
  36. 36. 4 pixel image -0.5 0 1.0 -1.0 Input layer
  37. 37. 4 pixel image -0.5 0 1.0 -1.0 + Input layer Sum of inputs
  38. 38. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 1.0 1.0 1.0 1.0 Weights
  39. 39. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 1.0 1.0 1.0 1.0 = -0.5 -0.5 x 1.0 0 x 1.0 1.0 x 1.0 -1.0 x 1.0 Weights
  40. 40. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 = 0.7 -0.5 x 0.2 0 x -0.3 1.0 x 0.0 -1.0 x -0.8 Weights
  41. 41. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 = 0.7 -0.5 x 0.2 0 x -0.3 1.0 x 0.0 -1.0 x -0.8 Weights
  42. 42. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 0.7 Activation function
  43. 43. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) Sigmoid activation function
  44. 44. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) sig(x) = 1 1 + e−x Sigmoid activation function
  45. 45. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) sig(x) = 1 1 + e−x Sigmoid activation function 0.7
  46. 46. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) sig(x) = 1 1 + e−x Sigmoid activation function 0.7 0.66
  47. 47. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) sig(x) = 1 1 + e−x Sigmoid activation function
  48. 48. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) sig(x) = 1 1 + e−x Sigmoid activation function 0.88
  49. 49. -8 -6 -4 -2 4 6 82 0.2 0.4 0.6 0.8 1.0 x sig(x) sig(x) = 1 1 + e−x Sigmoid activation function 0.99
  50. 50. -8 -6 -4 -2 4 6 82 0.2 0.4 0.6 0.8 1.0 sig(x) = 1 1 + e−x x sig(x) Sigmoid activation function 0.99 No matter how big your x can grow, your sig(x) will always be between 0 and 1
  51. 51. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 0.7 Activation function 0.66
  52. 52. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 Weighted sum + activation function = neuron 0.66 Activation function0.7
  53. 53. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 Weighted sum + activation function = neuron input of the neuron 0.66 Activation function0.7
  54. 54. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 Weighted sum + activation function = neuron input of the neuron output of the neuron 0.66 Activation function0.7
  55. 55. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 Weighted sum + activation function = neuron input of the neuron output of the neuron 0.66 Activation function0.7
  56. 56. 4 pixel image -0.5 0 1.0 -1.0 + Input layer 0.2 -0.3 0.0 -0.8 Weighted sum + activation function = neuron input of the neuron output of the neuron 0.66 Activation function0.7
  57. 57. 4 pixel image -0.5 0 1.0 -1.0 Input layer 0.2 -0.3 0.0 -0.8 Weighted sum + activation function = neuron 0.66 Weighted sum usually is not visualised explicitly
  58. 58. 4 pixel image -0.5 0 1.0 -1.0 Input layer 1.0 -1.0 Let’s make more neurons, connected by different weights 0.0(missing)
  59. 59. 4 pixel image -0.5 0 1.0 -1.0 Input layer
  60. 60. 4 pixel image -0.5 0 1.0 -1.0 Input layer More complex receptive fields
  61. 61. 4 pixel image -0.5 0 1.0 -1.0 Input layer More complex receptive fields
  62. 62. 4 pixel image -0.5 0 1.0 -1.0 Input layer More complex receptive fields
  63. 63. 4 pixel image -0.5 0 1.0 -1.0 Input layer More complex receptive fields
  64. 64. 4 pixel image -0.5 0 1.0 -1.0 Input layer
  65. 65. 4 pixel image -0.5 0 1.0 -1.0 Input layer Receptive fields get even more complex
  66. 66. 4 pixel image -0.5 0 1.0 -1.0 Input layer Adding one more layer
  67. 67. -8 -6 -4 -2 2 4 6 8 2 4 6 8 10 x max(0, x) Rectified Linear Units (ReLu)
  68. 68. -8 -6 -4 -2 2 4 6 8 2 4 6 8 10 R(x) = max(0, x) x max(0, x) Rectified Linear Units (ReLu)
  69. 69. -8 -6 -4 -2 2 4 6 8 2 4 6 8 10 R(x) = max(0, x) x max(0, x) Rectified Linear Units (ReLu) If number is positive, keep it. Otherwise it is zero.
  70. 70. 4 pixel image -0.5 0 1.0 -1.0 Input layer Adding one more layer
  71. 71. 4 pixel image -0.5 0 1.0 -1.0 Input layer Adding one more layer
  72. 72. 4 pixel image -0.5 0 1.0 -1.0 Input layer Add output layer Solid Vertical Diagonal Horizontal
  73. 73. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal
  74. 74. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0
  75. 75. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) sig(x) = 1 1 + e−x Sigmoid activation function
  76. 76. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0
  77. 77. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0 0.5 0.5 0.88 0.88
  78. 78. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0 0.0 1.0 0.5 0.5 0.88 0.88 1.76 0.0
  79. 79. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0 0.0 1.0 0.5 0.5 0.5 0.73 0.88 0.88 1.76 0.0 0.5 0.85
  80. 80. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0 0.0 1.0 0.73 -0.73 0.5 0.5 0.5 0.5 -0.5 0.73 0.88 0.88 1.76 0.0 0.5 0.5 -0.5 0.85 0.85 -0.85
  81. 81. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0 0.0 1.0 0.73 -0.73 0.5 0.5 0.5 0.5 -0.5 0.73 0.88 0.88 1.76 0.0 0.5 0.5 -0.5 0.85 0.85 -0.85 0.73 0.5 0.5 0.85
  82. 82. 4 pixel image -1.0 -1.0 1.0 1.0 Input layer Example Solid Vertical Diagonal Horizontal 0.0 0.0 2.0 2.0 0.0 1.0 0.73 -0.73 0.5 0.5 0.5 0.5 -0.5 0.73 0.88 0.88 1.76 0.0 0.5 0.5 -0.5 0.85 0.85 -0.85 0.73 0.5 0.5 0.85
  83. 83. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 Consider this 2D example
  84. 84. (0.1,0.05) x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 Consider this 2D example
  85. 85. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 (0.1,0.05) Consider this 2D example
  86. 86. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 (0.1,0.05) Consider this 2D example
  87. 87. x y 0.0 0.00 (0.1,0.05) Consider this 2D example
  88. 88. .05 0.1 Point coordinates as input x y 0 0.0 (0.1,0.05) y x
  89. 89. .05 0.1 Point coordinates as input Hidden layer 0.15 (w1) 0.30 (w4) 0.20(w2) x y 0 0.0 (0.1,0.05) y x 0.25(w3)
  90. 90. .05 0.1 Point coordinates as input Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.40 (w5) 0.55 (w8) 0.45(w6) x y 0 0.0 (0.1,0.05) y x 0.25(w3) 0.50(w7)
  91. 91. .05 0.1 Point coordinates as input TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.40 (w5) 0.55 (w8) 0.45(w6) ? ?x y 0 0.0 (0.1,0.05) y x 1.0 0.00.0 0.25(w3) 0.50(w7)
  92. 92. .05 0.1 Adding one more type of neurons TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.40 (w5) 0.55 (w8) 0.45(w6) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.25(w3) 0.50(w7)
  93. 93. .05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.40 (w5) 0.55 (w8) 0.45(w6) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input What is the role of bias in Neural Networks? 0.25(w3) 0.50(w7)
  94. 94. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) OutputInput x w1 sig(w1 * x) What is the role of bias in Neural Networks? Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
  95. 95. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x) OutputInput x w1 sig(w1 * x) What is the role of bias in Neural Networks? w1 =[0.5, 1.0, 2.0] Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
  96. 96. -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) OutputInput x w1 sig(w1 * x) What is the role of bias in Neural Networks? w1 =[0.5, 1.0, 2.0] OutputInput x w1 sig(w1 * x + b1) b b1 Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
  97. 97. OutputInput x w1 sig(w1 * x) Bias helps to shift the resulting curve w1 =[0.5, 1.0, 2.0] OutputInput x w1 sig(w1 * x + b1) b1 = [-4.0, 0.0, 4.0] b b1 w1=[1.0] -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x + b1) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x) b1 b1 Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
  98. 98. OutputInput x w1 sig(w1 * x) Bias helps to shift the resulting curve w1 =[0.5, 1.0, 2.0] OutputInput x w1 sig(w1 * x + b1) b1 = [-4.0, 0.0, 4.0] b b1 w1=[1.0] sig(x) = 1 1 + e−x -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x + b1) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x) b1 b1 Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
  99. 99. OutputInput x w1 sig(w1 * x) Bias helps to shift the resulting curve w1 =[0.5, 1.0, 2.0] OutputInput x w1 sig(w1 * x + b1) b1 = [-4.0, 0.0, 4.0] b b1 w1=[1.0] sig(x) = 1 1 + e−(wx+b) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x + b1) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(w1 * x) b1 b1 Source: http://www.uta.fi/sis/tie/neuro/index/Neurocomputing2.pdf
  100. 100. .05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input Let’s calculate scores for each class Feed-forward path
  101. 101. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input input(h1) = ? Input of the first neuron in the hidden layer? Feed-forward path
  102. 102. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input Input of the first neuron in the hidden layer? input(h1) = x * w1 + y * w2 + b1 input(h1) = ? Feed-forward path
  103. 103. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input input(h1) = 0.3775 Input of the first neuron in the hidden layer? input(h1) = x * w1 + y * w2 + b1 input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775 Feed-forward path
  104. 104. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input input(h1) = 0.3775 Input of the first neuron in the hidden layer? input(h1) = x * w1 + y * w2 + b1 input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775 What about output of this neuron? out(h1) = ? out(h1) = sig(input(h1)) Feed-forward path
  105. 105. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input input(h1) = 0.3775 Input of the first neuron in the hidden layer? input(h1) = x * w1 + y * w2 + b1 input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775 What about output of this neuron? out(h1) = ? out(h1) = sig(0.3775) Feed-forward path
  106. 106. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input input(h1) = 0.3775 Input of the first neuron in the hidden layer? input(h1) = x * w1 + y * w2 + b1 input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775 What about output of this neuron? out(h1) = ? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) 0.3775 out(h1) = sig(0.3775) Feed-forward path
  107. 107. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input input(h1) = 0.3775 Input of the first neuron in the hidden layer? input(h1) = x * w1 + y * w2 + b1 input(h1) = 0.05 * 0.15 + 0.1 * 0.2 + 0.35 = 0.3775 What about output of this neuron? out(h1) = ? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) 0.3775 out(h1) = sig(0.3775) = 0.5933 Feed-forward path
  108. 108. .05 0.1 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) Point coordinates as input out(h1) = 0.5933 0.0 TruthAnswer Error ? ? ? ?1.0 input(h1) = 0.3775 Feed-forward path
  109. 109. .05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) Feed-forward path
  110. 110. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input Input of the second neuron in the hidden layer? 0.5933 out(h1) input(h2) = ? Feed-forward path
  111. 111. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input Input of the second neuron in the hidden layer? 0.5933 out(h1) input(h2) = ? input(h2) = ? * ? + ? * ? + ? Feed-forward path
  112. 112. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input Input of the second neuron in the hidden layer? 0.5933 out(h1) input(h2) = ? input(h2) = x * w3 + y * w4 + b1 Feed-forward path
  113. 113. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input Input of the second neuron in the hidden layer? 0.5933 out(h1) input(h2) = 0.3925 input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925 input(h2) = x * w3 + y * w4 + b1 Feed-forward path
  114. 114. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input Input of the second neuron in the hidden layer? 0.5933 out(h1) input(h2) = 0.3925 input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925 input(h2) = x * w3 + y * w4 + b1 out(h2) = ? out(h2) = sig(0.3925) = 0.5969 Output of the second neuron is Feed-forward path
  115. 115. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input Input of the second neuron in the hidden layer? 0.5933 out(h1) input(h2) = 0.3925 input(h2) = 0.05 * 0.25 + 0.1 * 0.3 + 0.35 = 0.3925 input(h2) = x * w3 + y * w4 + b1 out(h2) = sig(0.3925) = 0.5969 Output of the second neuron is out(h2) = 0.5969 Feed-forward path
  116. 116. .05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) Feed-forward path
  117. 117. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) input(o1) = ? Feed-forward path
  118. 118. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) input(o1) = ? Input of the first neuron in the output layer? Feed-forward path
  119. 119. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) input(o1) = ? Input of the first neuron in the output layer? input(o1) = out(h1) * w5 + out(h2) * w6 + b2 Feed-forward path
  120. 120. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) input(o1) = ? Input of the first neuron in the output layer? input(o1) = out(h1) * w5 + out(h2) * w6 + b2 input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105 Feed-forward path
  121. 121. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) input(o1) = 1.105 Input of the first neuron in the output layer? input(o1) = out(h1) * w5 + out(h2) * w6 + b2 input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105 Feed-forward path
  122. 122. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) input(o1) = 1.105 Input of the first neuron in the output layer? input(o1) = out(h1) * w5 + out(h2) * w6 + b2 input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105 out(o1) = sig(input(o1)) Output out(o1) = ? Feed-forward path
  123. 123. 0.0.05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) input(o1) = 1.105 Input of the first neuron in the output layer? input(o1) = out(h1) * w5 + out(h2) * w6 + b2 input(o1) = 0.5933 * 0.4 + 0.5969 * 0.45 + 0.6 = 1.105 out(o1) = sig(1.105) = 0.7511 Output out(o1) = 0.7511 Feed-forward path
  124. 124. .05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) Feed-forward path
  125. 125. .05 0.1 TruthAnswer Error ? ? Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) Feed-forward path
  126. 126. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) Feed-forward path
  127. 127. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) How good are these predictions? Feed-forward path
  128. 128. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) In order to find out how good are these predictions, we should compare them to the expected outcomes
  129. 129. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) In order to find out how good are these predictions, we should compare them to the expected outcomes Eo1 = 1 2 (truth − out(o1))2
  130. 130. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) In order to find out how good are these predictions, we should compare them to the expected outcomes Eo1 = 1 2 (truth − out(o1))2 = 1 2 (0.751 − 0)2
  131. 131. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) In order to find out how good are these predictions, we should compare them to the expected outcomes Eo1 = 1 2 (truth − out(o1))2 = 1 2 (0.751 − 0)2 = 1 2 (−0.751)2
  132. 132. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) 0.282 ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) In order to find out how good are these predictions, we should compare them to the expected outcomes Eo1 = 1 2 (truth − out(o1))2 = 1 2 (0.751 − 0)2 = 0.282
  133. 133. 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) ? b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) In order to find out how good are these predictions, we should compare them to the expected outcomes 0.282 Eo1 = 1 2 (truth − out(o1))2 = 1 2 (0.751 − 0)2 = 0.282 Eo2 = 1 2 (truth − out(o2))2 = 1 2 (0.773 − 1)2 = 0.025
  134. 134. 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) 0.025 b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) In order to find out how good are these predictions, we should compare them to the expected outcomes 0.282 Eo1 = 1 2 (truth − out(o1))2 = 1 2 (0.751 − 0)2 = 0.282 Eo2 = 1 2 (truth − out(o2))2 = 1 2 (0.773 − 1)2 = 0.025
  135. 135. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) How good are these predictions? 0.025 0.282
  136. 136. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) How good are these predictions? 0.025 0.282 Well, they are this good.
  137. 137. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Overall the error is: Etotal = Eo1 + Eo2
  138. 138. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Overall the error is: Etotal = Eo1 + Eo2 = 0.282 + 0.025 = 0.307 Etotal = 0.307
  139. 139. .05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 0.00.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Our goal is to reduce the total error ( )!Etotal Etotal = 0.307
  140. 140. 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.40 (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 Our goal is to reduce the total error ( )!Etotal The only thing we can do is change weights
  141. 141. 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error?
  142. 142. 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 0.45 0.50 0.35 0.30 We can change it randomly (increase or decrease) and see what happens to How does w5 influence the total error? 0.55 0.25 Etotal 0.40
  143. 143. 0.0.05 0.1 TruthAnswer Error 0.757 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.286 Etotal = 0.311 0.45 0.50 0.35 0.30 We can change it randomly (increase or decrease) and see what happens to How does w5 influence the total error? 0.55 0.25 Etotal 0.40
  144. 144. 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 0.45 0.50 0.35 0.30 We can change it randomly (increase or decrease) and see what happens to How does w5 influence the total error? 0.55 0.25 Etotal 0.40
  145. 145. 0.0.05 0.1 TruthAnswer Error 0.745 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.278 Etotal = 0.303 0.45 0.50 0.35 0.30 We can change it randomly (increase or decrease) and see what happens to How does w5 influence the total error? 0.55 0.25 Etotal 0.40
  146. 146. 0.0.05 0.1 TruthAnswer Error 0.740 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.273 Etotal = 0.299 0.45 0.50 0.35 0.30 We can change it randomly (increase or decrease) and see what happens to How does w5 influence the total error? 0.55 0.25 Etotal 0.40
  147. 147. 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 Etotal = 0.299 0.45 0.50 0.35 0.30 We can change it randomly (increase or decrease) and see what happens to How does w5 influence the total error? 0.55 0.25 Etotal 0.40 w5 EtotalEtotal Higher weightLower weight 0.740 0.273
  148. 148. 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 Etotal = 0.299 0.45 0.50 0.35 0.30 We can change it randomly (increase or decrease) and see what happens to How does w5 influence the total error? 0.55 0.25 Etotal 0.40 w5 EtotalEtotal Higher weightLower weight The problem is that every time we need to recalculate all the neuron values again. 0.740 0.273
  149. 149. 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 Etotal = 0.299 0.45 0.50 0.35 0.30 0.55 0.25 0.40 0.740 0.273
  150. 150. 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) (w5) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 Etotal = 0.299 0.45 0.50 0.35 0.30 0.55 0.25 0.40 0.740 0.273
  151. 151. 0.6 (b2)0.60 0.55 0.35 (b1)0.35 0.30 0.20(w2) 0.250.200.15 0.25(w3) 0.50(w7) 0.30 0.25 0.20 0.55 0.50 0.45 0.45(w6) 0.500.450.40 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer 0.15 (w1) 0.30 (w4) (w5) 0.55 (w8) b b x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 Etotal = 0.299 0.35 0.30 0.25 0.740 0.273 0.20 0.10 0.15 0.60 0.50 0.55 0.35 0.25 0.30 0.650.40 Also tweaking each weight separately would be a nightmare
  152. 152. 0.6 (b2)0.60 0.55 0.35 (b1)0.35 0.30 0.20(w2) 0.250.200.15 0.25(w3) 0.50(w7) 0.30 0.25 0.20 0.55 0.50 0.45 0.45(w6) 0.500.450.40 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) (w5) 0.55 (w8) b b x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 Etotal = 0.299 0.35 0.30 0.25 0.740 0.273 0.20 0.10 0.15 0.60 0.50 0.55 0.35 0.25 0.30 0.650.40 Also tweaking each weight separately would be a nightmare
  153. 153. weight Etotal weight + 1 We want a way to efficiently update all our weights so that total error decreases most substantially weight - 1
  154. 154. weight Etotal weight + 1 We want a way to efficiently update all our weights so that total error decreases most substantially We want to compute the gradient weight - 1
  155. 155. weight Etotal weight + 1 We want to compute the gradient Gradient shows the direction of the greatest increase of the function weight - 1
  156. 156. weight Etotal weight + 1 We want to compute the gradient ∂E ∂w Gradient shows the direction of the greatest increase of the function weight - 1
  157. 157. weight Etotal weight + 1 We want to compute the gradient − ∂E ∂w weight - 1 ∂E ∂w Gradient shows the direction of the greatest increase of the function
  158. 158. w5 Etotal weight + 1 It looks OK when there is just one weight to take care of We want to compute the gradient − ∂E ∂w ∂E ∂w
  159. 159. Etotal weight + 1 ∂E ∂w− ∂E ∂w w5 It starts to look a lot scarier when we try to optimise all weights w4 w3 w2 w1 w6 w7 w8
  160. 160. Etotal weight + 1 ∂E ∂w w5 Gradient still can be computed efficiently using backpropagation algorithm It starts to look a lot scarier when we try to optimise all weights w4 w3 w2 w1 w6 w7 w8 − ∂E ∂w
  161. 161. 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm
  162. 162. 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? ∂Etotal ∂w5 = . . . Backpropagation algorithm
  163. 163. ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * . . . 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? How does error on out(o1) influence the total error? Backpropagation algorithm
  164. 164. ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * ∂Eo1 ∂outo1 * . . . 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? How does out(o1) influence the error on out(o1)? Backpropagation algorithm
  165. 165. ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * . . . 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? How does input(o1) influence the out(o1)? Backpropagation algorithm
  166. 166. ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? How does w5 influence the input(o1)? Backpropagation algorithm
  167. 167. ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂Etotal ∂Eo1 Etotal = Eo1 + Eo2 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  168. 168. ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂Etotal ∂Eo1 ∂Etotal ∂Eo1 = 1 Etotal = Eo1 + Eo2 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  169. 169. ∂Etotal ∂w5 = 1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂Etotal ∂Eo1 = 1 Etotal = Eo1 + Eo2 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  170. 170. ∂Etotal ∂w5 = 1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂Eo1 ∂outo1 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  171. 171. ∂Etotal ∂w5 = 1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂Eo1 ∂outo1 Eo1 = 1 2 (truth − outo1 )2 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? ∂Eo1 ∂outo1 = . . . 0.5933 out(h1) Backpropagation algorithm
  172. 172. ∂Etotal ∂w5 = 1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂Eo1 ∂outo1 Eo1 = 1 2 (truth − outo1 )2 ∂Eo1 ∂outo1 = 2 * 1 2 (truth − outo1 ) * (−1) = . . . 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  173. 173. ∂Etotal ∂w5 = 1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂Eo1 ∂outo1 Eo1 = 1 2 (truth − outo1 )2 ∂Eo1 ∂outo1 = 2 * 1 2 (truth − outo1 ) * (−1) = 0.751 − 0 = 0.751 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  174. 174. ∂Etotal ∂w5 = 1 * 0.751 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂outo1 ∂ino1 ∂outo1 ∂ino1 = sig(x) = 1 1 + e−x 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  175. 175. ∂outo1 ∂ino1 = outo1 * (1 − outo1 ) = 0.751 * (1 − 0.751) = 0.186 ∂Etotal ∂w5 = 1 * 0.751 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 ∂outo1 ∂ino1 sig(x) = 1 1 + e−x 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  176. 176. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * ∂ino1 ∂w5 ∂ino1 ∂w5 ∂ino1 ∂w5 = in01 = . . . 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  177. 177. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * ∂ino1 ∂w5 ∂ino1 ∂w5 ∂ino1 ∂w5 = in01 = outh1 * w5 + outh2 * w6 + b2 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  178. 178. ∂ino1 ∂w5 = outh1 = . . . ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * ∂ino1 ∂w5 ∂ino1 ∂w5 in01 = outh1 * w5 + outh2 * w6 + b2 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  179. 179. ∂ino1 ∂w5 = outh1 = 0.5933 ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * ∂ino1 ∂w5 ∂ino1 ∂w5 in01 = outh1 * w5 + outh2 * w6 + b2 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  180. 180. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? 0.5933 out(h1) Backpropagation algorithm
  181. 181. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) If value of the gradient is positive, then if we increase the w5, the total error will …?
  182. 182. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) If value of the gradient is positive, then if we increase the w5, the total error will increase!
  183. 183. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) w5new = w5old − ∂Etotal ∂w5 So we update w5: If value of the gradient is positive, then if we increase the w5, the total error will increase!
  184. 184. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) w5new = w5old − 0.082 So we update w5: If value of the gradient is positive, then if we increase the w5, the total error will increase!
  185. 185. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: If update by a lot, we can miss to optimum w5 Etotal If value of the gradient is positive, then if we increase the w5, the total error will increase! w5new = w5old − 0.082
  186. 186. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: Thus, we make a bit smaller step w5 Etotal step size If value of the gradient is positive, then if we increase the w5, the total error will increase! w5new = w5old − η * 0.082
  187. 187. w5new = 0.4 − 0.5 * 0.083 ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: Thus, we make a bit smaller step w5 Etotal If value of the gradient is positive, then if we increase the w5, the total error will increase!
  188. 188. ∂Etotal ∂w5 = 1 * 0.751 * 0.186 * 0.5933 = 0.082 0.40 (w5) 0.0.05 0.1 TruthAnswer Error 0.751 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7511 out(o1) 0.7732 out(o2) 0.025 0.282 Etotal = 0.307 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: Thus, we make a bit smaller step w5 Etotal If value of the gradient is positive, then if we increase the w5, the total error will increase! w5new = 0.4 − 0.5 * 0.083 = 0.3585
  189. 189. 0.359 (w5) 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) out(o1) 0.7732 out(o2) 0.025 Etotal = . . . How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: Thus, we make a bit smaller step w5 Etotal w5new = 0.4 − 0.5 * 0.083 = 0.3585
  190. 190. 0.0.05 0.1 TruthAnswer Error 0.611 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.6112 out(o1) 0.7732 out(o2) 0.025 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: Thus, we make a bit smaller step w5 Etotal Etotal = . . . 0.359 (w5) w5new = 0.4 − 0.5 * 0.083 = 0.3585
  191. 191. 0.0.05 0.1 TruthAnswer Error 0.611 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7732 out(o2) 0.025 0.187 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: Thus, we make a bit smaller step w5 Etotal Etotal = . . . 0.6112 out(o1) 0.359 (w5) w5new = 0.4 − 0.5 * 0.083 = 0.3585
  192. 192. Etotal = 0.303 0.2780.745 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7732 out(o2) 0.025 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) So we update w5: Thus, we make a bit smaller step w5 Etotal 0.6112 out(o1) 0.359 (w5) w5new = 0.4 − 0.5 * 0.083 = 0.3585
  193. 193. 0.0.05 0.1 TruthAnswer Error 0.773 Hidden layer Output layer 0.15 (w1) 0.30 (w4) 0.20(w2) 0.25(w3) 0.55 (w8) 0.45(w6) 0.50(w7) b 0.35 (b1) b 0.6 (b2) x y 0 0.0 (0.1,0.05) 1.0 Point coordinates as input 0.5933 out(h1) 0.5969 out(h2) 0.7732 out(o2) 0.025 How does w5 influence the total error? Backpropagation algorithm 0.5933 out(h1) 0.6112 out(o1) After updating all remaining weights total error = 0.131 Repeating 1000 times decreases it to 0.001 Etotal = 0.303 0.2780.745 0.359 (w5)
  194. 194. Why we should understand backpropagation? https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
  195. 195. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) Vanishing gradients on sigmoids https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
  196. 196. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig′(x) Vanishing gradients on sigmoids https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
  197. 197. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig′(x) Vanishing gradients on sigmoids Derivative is zero at tails https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
  198. 198. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig′(x) Vanishing gradients on sigmoids Derivative is zero at tails https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b Recall the chain rule: ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5
  199. 199. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig′(x) Vanishing gradients on sigmoids Derivative is zero at tails https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b Recall the chain rule: ∂Etotal ∂w5 = ∂Etotal ∂Eo1 * ∂Eo1 ∂outo1 * ∂outo1 ∂ino1 * ∂ino1 ∂w5 So, if x is too high (e.g. >8) or too low (e.g. < -8) then ∂outo1 ∂ino1 = 0 and thus ∂Etotal ∂w5 = 0
  200. 200. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig′(x) Vanishing gradients on sigmoids 2 4 6 8-8 -6 -4 -2 2 4 6 8 10 x Dying ReLus max(0, x) Derivative is zero at tails https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
  201. 201. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig′(x) Vanishing gradients on sigmoids 2 4 6 8-8 -6 -4 -2 2 4 6 8 10 x Dying ReLus max(0, x) 2 4 6 8-8 -6 -4 -2 0.2 0.4 0.6 0.8 1.0 x max′(0, x) Derivative is zero at tails https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
  202. 202. Why we should understand backpropagation? -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig(x) -8 -6 -4 -2 2 4 6 8 0.2 0.4 0.6 0.8 1.0 x sig′(x) Vanishing gradients on sigmoids 2 4 6 8-8 -6 -4 -2 2 4 6 8 10 x Dying ReLus max(0, x) 2 4 6 8-8 -6 -4 -2 0.2 0.4 0.6 0.8 1.0 x max′(0, x) Derivative is zero at tails Derivative is zero https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
  203. 203. x y 0.0 0.00 Now let’s practice (0.1,0.05)
  204. 204. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 Now let’s practice
  205. 205. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 Now let’s practice
  206. 206. x y -0.8 0.2 -0.6 -0.4 -0.2 0.0 0.4 0.6 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00 Now let’s practice https://goo.gl/cX2dGd
  207. 207. http://www.emergentmind.com/neural-network Training Neural Networks (part I)
  208. 208. http://playground.tensorflow.org/ Training Neural Networks (part II)
  209. 209. Resources: Brandon Rohrer’s youtube video: How Convolutional Neural Networks work (https://youtu.be/FmpDIaiMIeA) Brandon Rohrer’s youtube video: How Deep Neural Networks Work (https://youtu.be/ILsA4nyG7I0) Stanford University CS231n Convolutional Neural Networks for Visual Recognition (github.io version): http://cs231n.github.io/ Matt Mazur’s: A Step by Step Backpropagation Example (https:// mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/) Andrej Karphaty’s blog post: Yes you should understand backprop (https://medium.com/@karpathy/yes-you-should-understand-backprop- e2f06eab496b) Raul Vicente’s lecture: From brain to Deep Learning and back (https:// www.uttv.ee/naita?id=23585) Mayank Agarwal’s blog: Back Propagation in Convolutional Neural Networks — Intuition and Code (https://becominghuman.ai/back- propagation-in-convolutional-neural-networks-intuition-and- code-714ef1c38199)
  210. 210. Mordor riddle http://fouryears.eu/2009/11/21/machine-learning-in-mordor/
  211. 211. https://biit.cs.ut.ee/ Thank you!

×