Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

"Deep Learning" Chap.6 Convolutional Neural Net

This slide is my presentation for a reading circle "Machine Learning Professional Series".

Japanese version is here.
http://www.slideshare.net/matsukenbook/ss-50545587

  • Als Erste(r) kommentieren

"Deep Learning" Chap.6 Convolutional Neural Net

  1. 1. Chapter 6 Convolutional Neural Network 2015.7.15 wed. @kenmatsu4
  2. 2. Self-introduction ・Twitter account    @kenmatsu4 (Please follow me ) ・Blog I’m writing my blog posts on Qiita (But, Japanese Only) (Category: Statistics, Machine Learning, Python etc…)    http://qiita.com/kenmatsu4     (Over 2000 contribution ! ) ・My hobbies    - Playing the bass guitar with my band member.    - Traveling foreign countries, especially south-east Asia!    (Cambodia, Myanmar, Bangladesh, Uyghur etc) Pictures of my travel : http://matsu-ken.jimdo.com
  3. 3. ・Japanese version of this slide    http://www.slideshare.net/matsukenbook/ss-50545587 Information
  4. 4. Author : Takayuki Okatani Machine Learning Professional Series ISBN: 978-4-06-152902-1 “Deep Learning” Chapter 6 Convolutional Neural Net This is a slide for study group. Very good text for introduction of “Deep Learning. Let’s buy! Unfortunately, Japanese only…
  5. 5. MASAKARI Come On !!! Let’s study together https://twitter.com/_inundata/status/616658949761302528
  6. 6. For processing images with Neural Network, let’s use knowledge of neuroscience!
  7. 7. • Receptive field • Simple cells • Complex cells Using analogy of neuroscience
  8. 8. Receptive field ≒ Retina cells http://bsd.neuroinf.jp/wiki/%e5%8f%97%e5%ae%b9%e9%87%8e ON centered, OFF surrounded OFF centered, ON surrounded ON region OFF region
  9. 9. On Center Cell On Center CellOff Center Cell Off Center Cell https://en.wikipedia.org/wiki/Hypercomplex_cell Receptive field ≒ Retina cells
  10. 10. Simple Cells and Complex Cells https://en.wikipedia.org/wiki/Hypercomplex_cell Forming a simple cell with setting receptive-field in line When exposed to light on + area and not exposed to light on - area,excitatory response occurs When exposed to light on + and - area simultaneously, excitatory response doesn’t occur Simple Cells
  11. 11. http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/V1/lgn-V1.html Continuously respond with parallel moving, however, doesn’t respond with rotation. Simple Cells and Complex Cells Complex Cells
  12. 12. Main topic is from here. Treat mathematically these knowledge of neuroscience, and apply it to “Object Category Recognition”
  13. 13. Model of Simple Cells and Complex Cells Receptive-field Simple Cell Complex Cell The part of pink is a filter Blue cell indicate the input signal.
  14. 14. Receptive-field Simple Cell Complex Cell The part of pink is a filter Model of Simple Cells and Complex Cells
  15. 15. The part of pink is a filter Receptive-field Simple Cell Complex Cell Model of Simple Cells and Complex Cells
  16. 16. The part of pink is a filter Receptive-field Simple Cell Complex Cell Model of Simple Cells and Complex Cells
  17. 17. The part of pink is a filter Receptive-field Simple Cell Complex Cell Model of Simple Cells and Complex Cells
  18. 18. The part of pink is a filter Receptive-field Simple Cell Complex Cell Model of Simple Cells and Complex Cells
  19. 19. Blue cell indicate the input signal. The part of pink is a filter Receptive-field Model of Simple Cells and Complex Cells Simple Cell Complex Cell
  20. 20. Model of Simple Cells and Complex Cells Input pattern has parallel shifted. Receptive-field Simple Cell The cell on upper left was no longer respond due to position change Complex Cell
  21. 21. If inputs is rotated… the cell is not responded. Model of Simple Cells and Complex Cells Receptive-field Simple Cell Complex Cell
  22. 22. • Neocognitron   First application of 2 layer structure (Simple cells, Complex cells) for engineering pattern recognition) • LaNet   LaNet is considered to roots of Convolutional Neural Net    ( http://yann.lecun.com/exdb/lenet/ ) Similar methods
  23. 23. Whole Structure
  24. 24. • fully-connected layer • convolution layer • pooling layer • Local Contrast Normalization layer, LCN Types of layer used on CNN → Discussed to previous chapter is fully-connected layer. Output of l-1 layer is input to all of units on l layer
  25. 25. Structure of typical CNNinput(image) convolution convolution pooling LCN convolution pooling fully-connected fully-connected softmax output(categorylabel) In many cases, pooling layer is put after a couple of Convolution layers. Sometimes LCN layer is allocated after that. If the purpose is classification, Softmax function which is multi- variate version of sigmoid function is usually used. Softmax Function fi(x) = exp(xi) Pn j exp(xj) example
  26. 26. def forward(self, x_data, y_data, train=True): x = Variable(x_data, volatile=not train) t = Variable(y_data, volatile=not train) h = F.relu(self.conv1(x)) h = F.relu(self.conv1a(h)) h = F.relu(self.conv1b(h)) h = F.max_pooling_2d(h, 3, stride=2) h = F.relu(self.conv2(h)) h = F.relu(self.conv2a(h)) h = F.relu(self.conv2b(h)) h = F.max_pooling_2d(h, 3, stride=2) h = F.relu(self.conv3(h)) h = F.relu(self.conv3a(h)) h = F.relu(self.conv3b(h)) h = F.dropout(h, F.max_pooling_2d(h, 3, stride=2), train=train) h = F.relu(self.conv4(h)) h = F.relu(self.conv4a(h)) h = F.relu(self.conv4b(h)) h = F.reshape(F.average_pooling_2d(h, 6), (x_data.shape[0], 1000)) return F.softmax_cross_entropy(h, t), F.accuracy(h, t) Example of Chainer (Deep Learning Framework) https://github.com/pfnet/chainer/tree/master/examples/imagenet
  27. 27. Definition of Convolution
  28. 28. Definition of Convolution (0,0) (0,1) ・・・ (0, W-2) (0, W-1) (1, 0) (1, 1) ・・・ (1, W-2) (1, W-1) ・・・ ・・・ ・・・ ・・・ (W-2, 0) (W-2, 1) ・・・ (W-2, W-2) (W-2, W-1) (W-1, 0) (W-1, 1) ・・・ (W-1, W-2) (W-1, W-1) Wpixel W pixel Address map of W x W pixel image 0 0 1 0 ・・・ 0 0 0 0 0 1 0 0 ・・・ 0 0 0 0 1 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ 0 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 Example of W x W pixel data 0.01 0.02 0.05 0.15 0.02 0.05 0.15 0.05 0.05 0.15 0.05 0.02 0.15 0.05 0.02 0.01 H pixel Hpixel Filter of H x H pixel xij(i, j) Definition of convolution of pixels uij = H 1X p=0 H 1X q=0 xi+p,j+qhpq ※ Actually, right symbol of x’s index just before p and q is -, however there is no substantial difference with this notation. So + is also fine.
  29. 29. Definition of Convolution (0,0) (0,1) ・・・ (0, W-2) (0, W-1) (1, 0) (1, 1) ・・・ (1, W-2) (1, W-1) ・・・ ・・・ ・・・ ・・・ (W-2, 0) (W-2, 1) ・・・ (W-2, W-2) (W-2, W-1) (W-1, 0) (W-1, 1) ・・・ (W-1, W-2) (W-1, W-1) Wpixel W pixel Address map of W x W pixel image 0 0 1 0 ・・・ 0 0 0 0 0 1 0 0 ・・・ 0 0 0 0 1 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ 0 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 0 0 0 0 ・・・ 0 0 0 0 Example of W x W pixel data H pixel Hpixel Filter of H x H pixel xij(i, j) Definition of convolution of pixels uij = H 1X p=0 H 1X q=0 xi+p,j+qhpq ※ Actually, right symbol of x’s index just before p and q is -, however there is no substantial difference with this notation. So + is also fine. 0.01 0.02 0.05 0.15 0.02 0.05 0.15 0.05 0.05 0.15 0.05 0.02 0.15 0.05 0.02 0.01
  30. 30. Role of Convolution cos filter Lenna’s image https://gist.github.com/matsuken92/5b78c792f2ab98576c5c 畳込み uij = H 1X p=0 H 1X q=0 xi+p,j+qhpq Extracted feature of contrasting density from the image.
  31. 31. Role of Convolution cos filter Lenna’s image https://gist.github.com/matsuken92/5b78c792f2ab98576c5c 畳込み uij = H 1X p=0 H 1X q=0 xi+p,j+qhpq Extracted feature of contrasting density from the image.
  32. 32. By the way…
  33. 33. The filter size is・・・ Role of Convolution
  34. 34. like this. The filter size is・・・ Role of Convolution
  35. 35. Padding (W 2bH/2c) ⇥ (W 2bH/2c) bH/2cbH/2c W H b·c* means round down to integer x00 A preparation method of filtering for edge of image properly without reducing image size. The image size will be reduced as much as this. Padding is used in order to avoid this reducing.
  36. 36. H 1 Question: If we interpret the equation straightforwardly, isn't the reduced area like the figure on the left? uij = H 1X p=0 H 1X q=0 xi+p,j+qhpq x00 Padding
  37. 37. Zero-padding 0 0 0 0 0 0 0 0 0 0 0 77 80 82 78 70 82 82 140 0 0 83 78 80 83 82 77 94 151 0 0 87 82 81 80 74 75 112 152 0 0 87 87 85 77 66 99 151 167 0 0 84 79 77 78 76 107 162 160 0 0 86 72 70 72 81 151 166 151 0 0 78 72 73 73 107 166 170 148 0 0 76 76 77 84 147 180 168 142 0 0 0 0 0 0 0 0 0 0 0 The method that the padding area is filled by 0. → This is broadly used for convolutional neural net. Demerit Consequence of the convolution with zero-padding, around the edge becomes dark. Filled by the pixels of most outside. Filled by the folded back pixels on the four side. The other method
  38. 38. Stride 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation uij = H 1X p=0 H 1X q=0 xsi+p,sj+qhpq s : Stride Output image size when stride is applied (b(W 1)/sc + 1) ⇥ (b(W 1)/sc + 1) It is common that stride is more than 2 on a pooling layer.
  39. 39. Stride 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation uij = H 1X p=0 H 1X q=0 xsi+p,sj+qhpq s : Stride Output image size when stride is applied (b(W 1)/sc + 1) ⇥ (b(W 1)/sc + 1) It is common that stride is more than 2 on a pooling layer.
  40. 40. Stride 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation uij = H 1X p=0 H 1X q=0 xsi+p,sj+qhpq s : Stride Output image size when stride is applied (b(W 1)/sc + 1) ⇥ (b(W 1)/sc + 1) It is common that stride is more than 2 on a pooling layer.
  41. 41. Stride 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 When the filter is slid with few pixels step by step, not one by one, for calculating sum of products, in that case, the interval of the filter is called “Stride”. If you handle very large size image, it is able to avoid that the output unit is too much larger. (Trade off with performance degradation uij = H 1X p=0 H 1X q=0 xsi+p,sj+qhpq s : Stride Output image size when stride is applied (b(W 1)/sc + 1) ⇥ (b(W 1)/sc + 1) It is common that stride is more than 2 on a pooling layer.
  42. 42. Convolution Layer
  43. 43. Convolution layer Convolution Layer This is correspond to Simple Cell, as described the following figure. The part of pink is a filter Blue cell indicate the input signal. Receptive-field Simple Cell Complex Cell
  44. 44. Calculate convolution with parallel filters to multi channel image (e.g. RGB) , don’t use just 1 pcs o grayscale image on the practical Convolutional NeuralNet. W W K W :The number of pixel K :The number of channel e.g. K=3 (RGB image) W ⇥ W ⇥ KImage size : Convolution Layer
  45. 45. In some context, the size of image ( ) is called as “map”. Much more channel size (e.g. K=16, K=256 etc) is commonly used on hidden layers (convolution layer or pooling layer) W ⇥ W ⇥ K Convolution Layer
  46. 46. The equation to obtain is the following. uijm = K 1X k=0 H 1X p=0 H 1X q=0 z (l 1) i+p,j+q,khpqkm + bijm … W W K Filter 1 * … H H K hpqk0 f(·) m = 0 uij0 zij0 uijm = K 1X k=0 H 1X p=0 H 1X q=0 z (l 1) i+p,j+q,khpqk Bias is commonly set as which doesn’t depend on the position ( ) of pixel of the image. (it is like some whole ‘s density) bijm = bm i, j uijm b0 Convolution Layer
  47. 47. … W W K Filter 1 * … H H K hpqk0 f(·) m = 0 uij0 zij0 Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying” zij0hpqk0 b0 Convolution Layer
  48. 48. … W W K Filter 1 * H H K hpqk0 f(·) m = 0 uij0 zij0 b0 Convolution Layer Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying” zij0hpqk0
  49. 49. … W W K Filter 1 * H H K hpqk0 f(·) m = 0 uij0 zij0 b0 Convolution Layer Identical values of weight is used for every pixel , it is called as ”weight sharing, weight tying” zij0hpqk0
  50. 50. … W W K Filter 1 Filter 2 Filter 3 * * * … H H … H H … H H K hpqk0 hpqk1 hpqk2 m = 0 m = 1 m = 2 uij0 z (l 1) ijk uij1 uij2 zijm (l) zij2zij1zij0 M f(·) f(·) f(·) b0 b1 b2
  51. 51. Output from Convolution layer can be regarded as a multi- channel image whose size is with interpreting the number of filter as channel size. W ⇥ W ⇥ M Convolution Layer
  52. 52. … H H K Parameter size is not depend on the size of image ( ) … H H K … H H K When M=3 W ⇥ W H ⇥ H ⇥ K ⇥ M Parameter size is the following. That is, filter size ⇥ filter size ⇥ channel size ⇥ the number of filter Convolution Layer
  53. 53. Gradient Descent method is applied for parameter optimization of Convolutional Neural Net, too. The targets of optimization are   and bias For the calculation of the gradient, Back Propagation is also used. (in detail, explained later) uijm = K 1X k=0 H 1X p=0 H 1X q=0 z (l 1) i+p,j+q,khpqkm + bijm hpqkm bijm Convolution Layer
  54. 54. Pooling Layer
  55. 55. Pooling Layer Generally, Pooling layer is located just after convolution layer, . input(image) convolution convolution pooling LCN convolution pooling fully-connected fully-connected softmax output(categorylabel) example
  56. 56. Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition). Pooling Layer The part of pink is a filter Blue cell indicate the input signal. Receptive-field Pooling layer Simple Cell Complex Cell
  57. 57. Pooling Layer The part of pink is a filter Blue cell indicate the input signal. Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition). Receptive-field Pooling layer Simple Cell Complex Cell
  58. 58. Pooling layer is final layer of the following figure (Complex Cell part). It is designed to make the output of the pooling layer unchanged even if the target feature value becomes a little bit changed (or parallel transition). Pooling Layer The part of pink is a filter Blue cell indicate the input signal. Receptive-field Pooling layer Simple Cell Complex Cell
  59. 59. zij H H denotes a set of pixels included this area. Pij W W A pixel value is obtained by using pcs of pixel value with every channelsk H2 uijk Padding Pooling Layer
  60. 60. 1. Max pooling 2. Average pooling 3. Lp pooling 3 Types of pooling layer
  61. 61. Using maximum value from the pixels in the area. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 1.Max Pooling 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 uijk = max p,q2Pi,j zpqk zpqk uijk H2 Standard way to apply image recognition
  62. 62. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 uijk = max p,q2Pi,j zpqk zpqk uijk 1.Max Pooling Using maximum value from the pixels in the area.H2 Standard way to apply image recognition
  63. 63. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 uijk = max p,q2Pi,j zpqk zpqk uijk 1.Max Pooling Using maximum value from the pixels in the area.H2 Standard way to apply image recognition
  64. 64. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 uijk = max p,q2Pi,j zpqk zpqk uijk 1.Max Pooling Using maximum value from the pixels in the area.H2 Standard way to apply image recognition
  65. 65. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 uijk = max p,q2Pi,j zpqk zpqk uijk 1.Max Pooling Using maximum value from the pixels in the area.H2 Standard way to apply image recognition
  66. 66. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 uijk = max p,q2Pi,j zpqk zpqk uijk 1.Max Pooling Using maximum value from the pixels in the area.H2 Standard way to apply image recognition
  67. 67. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 zpqk uijk 1.Max Pooling Using maximum value from the pixels in the area.H2 Standard way to apply image recognition
  68. 68. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 87 87 87 83 112 152 152 152 87 87 87 99 151 167 167 167 87 87 87 107 162 167 167 167 87 87 87 151 166 167 167 167 87 87 107 166 170 170 170 170 87 87 147 180 180 180 180 180 86 86 147 180 180 180 180 180 86 86 147 180 180 180 180 180 uijk = max p,q2Pi,j zpqk zpqk uijk 1.Max Pooling Using maximum value from the pixels in the area.H2 Standard way to apply image recognition
  69. 69. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk 2. Average Pooling uijk = 1 H2 X (p,q)2Pij zpqk Using average value from the pixels in the area.H2
  70. 70. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk uijk = 1 H2 X (p,q)2Pij zpqk 2. Average Pooling Using average value from the pixels in the area.H2
  71. 71. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk uijk = 1 H2 X (p,q)2Pij zpqk 2. Average Pooling Using average value from the pixels in the area.H2
  72. 72. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk uijk = 1 H2 X (p,q)2Pij zpqk 2. Average Pooling Using average value from the pixels in the area.H2
  73. 73. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk uijk = 1 H2 X (p,q)2Pij zpqk 2. Average Pooling Using average value from the pixels in the area.H2
  74. 74. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk uijk = 1 H2 X (p,q)2Pij zpqk 2. Average Pooling Using average value from the pixels in the area.H2
  75. 75. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk uijk = 1 H2 X (p,q)2Pij zpqk 2. Average Pooling Using average value from the pixels in the area.H2
  76. 76. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 80.9 79.8 78.9 82.1 95.5 99.3 107.2 82.4 81.7 80.0 79.9 85.5 99.6 104.6 115.2 81.9 81.3 79.7 80.6 88.4 103.0 109.0 120.7 81.2 80.4 79.5 82.8 94.2 109.8 117.7 131.7 80.0 79.0 79.4 86.4 101.2 116.8 127.1 142.5 78.6 78.2 81.6 93.3 110.5 126.0 138.3 152.5 76.7 76.7 81.9 95.9 114.3 129.5 142.6 155.9 75.6 75.8 82.9 100.1 119.0 133.7 148.1 160.2 zpqk uijk uijk = 1 H2 X (p,q)2Pij zpqk 2. Average Pooling Using average value from the pixels in the area.H2
  77. 77. 3.Lp pooling https://gist.github.com/matsuken92/5b78c792f2ab98576c5c#file-03_anim_lp_pooling-py uijk = 0 @ 1 H2 X (p,q)2Pij zP pqk 1 A 1 P Lp pooling is a general way including Max pooling and Average pooling. When , it works as Average pooling. When , it works as Max pooling. P = 1 P = 1 e.g. Uniform distribution
  78. 78. https://gist.github.com/matsuken92/5b78c792f2ab98576c5c#file-03_anim_lp_pooling-py uijk = 0 @ 1 H2 X (p,q)2Pij zP pqk 1 A 1 P 3.Lp pooling Lp pooling is a general way including Max pooling and Average pooling. When , it works as Average pooling. When , it works as Max pooling. P = 1 P = 1 e.g. Beta distribution
  79. 79. Generally, calculation is conducted on every input channel independently on pooling layer, so the number of output-channel is same as input.K … W W K … W W K Pooling Layer The number of channel K is not changed. ※ Normally, activate function is not applied on pooling layer. There is no parameter which is adjustable, since the weights on the pooling layer is fixed. Pooling Layer
  80. 80. Pooling size : , Stride : 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 79.8 82.1 99.3 81.9 79.7 88.4 109.0 80.0 79.4 101.2 127.1 76.7 81.9 114.3 142.6 zpqk uijk Stride of the pooling layer s = 25 ⇥ 5 b(W 1)/sc + 1 The size of output layer So, in this example… b(8 1)/2c + 1 = 4
  81. 81. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 79.8 82.1 99.3 81.9 79.7 88.4 109.0 80.0 79.4 101.2 127.1 76.7 81.9 114.3 142.6 zpqk uijk Stride of the pooling layer Pooling size : , Stride : s = 25 ⇥ 5 b(W 1)/sc + 1 The size of output layer So, in this example… b(8 1)/2c + 1 = 4
  82. 82. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 79.8 82.1 99.3 81.9 79.7 88.4 109.0 80.0 79.4 101.2 127.1 76.7 81.9 114.3 142.6 zpqk uijk Stride of the pooling layer Pooling size : , Stride : s = 25 ⇥ 5 b(W 1)/sc + 1 The size of output layer So, in this example… b(8 1)/2c + 1 = 4
  83. 83. 77 80 82 78 70 82 82 140 83 78 80 83 82 77 94 151 87 82 81 80 74 75 112 152 87 87 85 77 66 99 151 167 84 79 77 78 76 107 162 160 86 72 70 72 81 151 166 151 78 72 73 73 107 166 170 148 76 76 77 84 147 180 168 142 81.1 79.8 82.1 99.3 81.9 79.7 88.4 109.0 80.0 79.4 101.2 127.1 76.7 81.9 114.3 142.6 zpqk uijk Stride of the pooling layer Pooling size : , Stride : s = 25 ⇥ 5 b(W 1)/sc + 1 The size of output layer So, in this example… b(8 1)/2c + 1 = 4
  84. 84. 1. Normalization for single channel 1-1. Subtractive Normalization 1-2. Divisive Normalization 2. Normalization for multi channel 2-1. Subtractive Normalization 2-2. Divisive Normalization Local Contrast Normalization (LCN)
  85. 85. Contrast http://homepage2.nifty.com/tsugu/sotuken/ronbun/sec3-2.html#0005 High contrast Low contrast Original Contrast adjustment is a operation controlling the difference of color strength on a image. If high contrast, more distinguishable between bright and dark. Input pixel value Outputpixelvalue
  86. 86. Brightness http://www.mis.med.akita-u.ac.jp/~kata/image/monogamma.html High Brightness Low Brightness Original Brightness adjustment uses exponential function for transformation with parameter . γ = 1.5 γ = 2.0
  87. 87. Normalization exijk = 1 N NX n=1 x (n) ijk Calculate average between training images for every channel and every pixels xijk xijk exijk Converted input data is the value subtracted this average value from target pixel. x (n) ijk :Pixel value of channel (i, j)and address k The processing is applied for every image individually. Average of pixels for every training image Local Contrast Normalization
  88. 88. LCN: Subtractive Normalization Single channel image (Gray scale image etc…) zij H Pij W H W W W xij ¯xij = 1 H2 X (p,q)2Pij xi+p,j+q ¯xij = X (p,q)2Pij wpqxi+p,j+q Subtracting average pixel value in the area from pixels of input image.Pij Weighted Average zij = xij ¯xij Average
  89. 89. ¯xij = X (p,q)2Pij wpqxi+p,j+q Weighted Average The weight is set to the sum of the weight is 1. X (p,q)2Pij wpq = H 1X p=0 H 1X q=0 wpq = 1 0.01 0.01 0.01 0.01 0.01 0.01 0.05 0.05 0.05 0.01 0.01 0.05 0.44 0.05 0.01 0.01 0.05 0.05 0.05 0.01 0.01 0.01 0.01 0.01 0.01 To arrange the values to the following. Example of the weight LCN: Subtractive Normalization Single channel image (Gray scale image etc…) - Locate max value on the center. - The value closer to the edge has lower value.
  90. 90. H = 17 https://gist.github.com/matsuken92/5b78c792f2ab98576c5c H = 9H = 5H = 3 LCN: Subtractive Normalization
  91. 91. ¯xij = X (p,q)2Pij wpqxi+p,j+q Weighted Average X (p,q)2Pij wpq = H 1X p=0 H 1X q=0 wpq = 1 2 ij = X (p,q)2Pij wpq(xi+p,j+q ¯xij)2 Calculate variance of pixels in area , and apply normalization with this. This variance of pixels is the following. Pij Normalized value is the following. zij = xij ¯xij ij LCN: Divisive Normalization Single channel image (Gray scale image etc…)
  92. 92. In order to avoid it, define a constant value . If standard deviation of pixels is lower than , divided with . That is, However, if using normalized value as it is, there is a demerit which is emphasized noise where contrasting density is low. zij = xij ¯xij ij c c zij = xij ¯xij max(c, ij) There is also similar way which is continuously changed depend on the value ij zij = xij ¯xij q c + 2 ij LCN: Divisive Normalization c
  93. 93. Considering interaction between channels, using average of same area with every channel. ¯xij = 1 K K 1X k=0 X (p,q)2Pij wpqxi+p,j+q,k Pij Subtract which is commonly used between channels from every pixel (i, j) zijk = xijk ¯xij … W W K ¯xij zijk LCN: Subtractive Normalization Multi channel image (RGB etc…) ¯xij
  94. 94. 2 ij = 1 K K 1X k=0 X (p,q)2Pij wpqk(xi+p,j+q,k ¯xij)2 The variance of local area Pij Calculation of divisive normalization zijk = xij ¯xijk q c + 2 ij zijk = xijk ¯xij max(c, ij) In the case denominator is changed continuously depend on the value of variance LCN: Divisive Normalization Multi channel image (RGB etc…)
  95. 95. Interaction between channels is applied on Normalization Layer for multi channel image → The idea of biological visual feature as the following is introduced into the model. Local Contrast Normalization - Sensitive to the difference of the contents - Insensitive to the absolute difference such as brightness or contrast etc.
  96. 96. Calculation of gradient
  97. 97. z(l) = f(u(l) )(l) = f(W(l) z(l 1) + b(l) )(l) m 1 1 m 1 m 1 nm n Review b (l) j lth Layer (l 1)th Layer
  98. 98. However, is not fully-connected W(l)
  99. 99. Convolution layer … W W K フィルタ1 H m = 0 uij0 H uijm = K 1X k=0 H 1X p=0 H 1X q=0 z (l 1) i+p,j+q,khpqkm + bijm Weight sharing, weight tying is applied.
  100. 100. Weight matrix can be constructed from a vector which is lined up with and the size is … H H K … H H K … H H K M=3 Gradient calculation of convolution W(l) hpqkm m n W(l) How? h H ⇥ H ⇥ K ⇥ M
  101. 101. Example of The length of is H = 3, K = 2, M = 2 h 3 ⇥ 3 ⇥ 2 ⇥ 2 = 36 K = 0 K = 1 M=0M=1 h0000 h0100 h0200 h1000 h1100 h1200 h2200h2100h2000 h0010 h0110 h0210 h1210h1110h1010 h2010 h2110 h2210 h0001 h0101 h0201 h1001 h1101 h1201 h2001 h2101 h2201 h0011 h0111 h0211 h1011 h1011 h1211 h2011 h2111 h2211 h0000 h0100 h0200 h1000 h1100 h1200 h2000 h2100 h2200 h h2211 h2111 h2011 h0011 h2210 h0010 h2201 h0001 (H ⇥ H ⇥ K ⇥ M) (H ⇥ H ⇥ K ⇥ M) Gradient calculation of convolution
  102. 102. … h0000 h0100 h0200 h1000 h1100 h1200 h2000 h2100 h2200 Z (l 1) 0 Z (l 1) 0 U (l) 0 z20,0 z00,0 z01,0 z02,0 z10,0 z11,0 z12,0 z21,0 z22,0 i j
  103. 103. … h0000 h0100 h0200 h1000 h1100 h1200 h2000 h2100 h2200 Z (l 1) 0 Z (l 1) 0 U (l) 0 z20,0 z00,0 z01,0 z02,0 z10,0 z11,0 z12,0 z21,0 z22,0 i j wji = h0100
  104. 104. … h0000 h0100 h0200 h1000 h1100 h1200 h2000 h2100 h2200 Z (l 1) 0 Z (l 1) 0 U (l) 0 z20,0 z00,0 z01,0 z02,0 z10,0 z11,0 z12,0 z21,0 z22,0 i j
  105. 105. … h0000 h0100 h0200 h1000 h1100 h1200 h2000 h2100 h2200 Z (l 1) 0 Z (l 1) 0 U (l) 0 z20,0 z00,0 z01,0 z02,0 z10,0 z11,0 z12,0 z21,0 z22,0 i j
  106. 106. Gradient calculation of convolution h0000 h0100 h0200 h1000 h1100 h1200 h2000 h2100 h2200 h h2211 h2111 h2011 h0011 h2210 h0010 h2201 h0001 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 tij wji = tT jih 1 2 3 36 35 … r … …… 1 3 … r … 36 35 ……(i = 3, j = 2) r=2 When
  107. 107. h0000 h0100 h0200 h1000 h1100 h1200 h2000 h2100 h2200 h h2211 h2111 h2011 h0011 h2210 h0010 h2201 h0001 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 tij 1 2 3 36 35 … r … …… 1 3 … r … 36 35 …… r=2 wji = tT jih tjir (i = 3, j = 2) When Gradient calculation of convolution
  108. 108. 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Tr 0 1 … W−1… i 0 1 … W−1 … j r = 2 Gradient calculation of convolution (i = 3, j = 2) When
  109. 109. (@h)r = X i,j (Tr @W)ji @E @W(l) = @W = (l) z(l 1)T Partial derivative of with respect to on the layer Find the gradient of filter with which is calculated on previous page Tr l h i j h r(H ⇥ H ⇥ K ⇥ M) (W ⇥ W) Gradient calculation of convolution E W
  110. 110. Review z (l) j u (l) j (l+1) 1 (l+1) k (l+1) M w (l+1) 1j w (l+1) kj w (l+1) Mj w (l) ji Differentiate w.r.t. this. z (l 1) i f0(l) @En @w (l) ji = @En @u (l) j @u (l) j @w (l) ji = f0 (u (l) j )( X k w (l+1) kj (l+1) k )z (l 1) i lth Layer(l + 1)th Layer (l 1)th Layer
  111. 111. @En @w (l) ji = @En @u (l) j @u (l) j @w (l) ji = f0 (u (l) j )( X k w (l+1) kj (l+1) k )z (l 1) i Matrix expression means Product symbol for every element of Matrices (6.5) (l) = f0(l) (u(l) ) (W(l+1)T (l+1) ) @E @W(l) = @W = (l) z(l 1)T (l) j = f0 (u (l) j )( X k w (l+1) kj (l+1) k ) m 1 1 1 m nm n Gradient calculation of convolution
  112. 112. Handling Pooling Layer Calculation of gradient is not necessary, since there is no parameter for learning. So only back propagation of delta is calculated. Perform calculation (6.5) described on previous page for every types of pooling with deciding W(l+1) Average Pooling Max Pooling w (l+1) ji = ( 1 (i, j) for max value 0 otherwise w (l+1) ji = ( 1 H2 if i 2 Pji 0 otherwise
  113. 113. Thanks • Azusa Colors (Keynote template) http://sanographix.github.io/azusa-colors/

×