SlideShare ist ein Scribd-Unternehmen logo
1 von 63
CONVOLUTIONAL NEURAL NETS 소개
2019. 3.
김 홍 배1
목차
 Convolutional Neural Nets ?
 Convolutional Neural Nets의 응용예
 Convolutional Neural Nets의 동작원리
 Convolutional Neural Nets의 진화과정
 Brief intro : Invariance and Equivariance
 Limitations of CNN
 Group CovNet
 Capsule Net
2
3
x_image
(28x28)
Reshape
28x28  784x1 vector
.
.
.
10 digits
.
.
W, bx y=softmax(Wx+b)
Neural Nets
# of unknown parameters to estimate = # of weights + # of bias
= 784x10+10 = 7,850 !!!
• 일반적인 Neural Net의 경우, 입력 이미지의 pixel 정보로 부터 시작
• 고해상도 이미지를 고속으로 처리가 불가능
CONVOLUTIONAL NEURAL NETS ?
CONVOLUTIONAL NEURAL NETS ?
 딥러닝 기반 시각인지를 위한 Networks
4
• CNN은 간단한 형상의 Patch(Filter or Kernel) 단위로 특징 추출
• 상위계층으로 진행될 수록 사물의 전체 형상을 구성
 추정해야 할 parameter의 수가 줄어듬
5
• Color images are three dimensional and so have a volume
• Time domain speech signals are 1-d while the frequency domain
representations (e.g. MFCC vectors) take a 2d form. They can also be
looked at as a time sequence.
• Medical images (such as CT/MR/etc) are multi-dimensional
• Videos have the additional temporal dimension compared to stationary
images
• Variable length sequences and time series data are again multi-dimensional
• Hence it makes sense to model them as tensors instead of vectors.
CONVOLUTIONAL NEURAL NETS ?
Types of inputs
6
• Image retrieval from database
• Object Detection
• Self driving cars
• Semantic segmentation
• Face recognition (FB tagging)
• Pose estimation
• Detect diseases
• Speech Recognition
• Text processing
• Analysing satellite data
CONVOLUTIONAL NEURAL NETS의 응용 예
CNNs are everywhere
7
CONVOLUTIONAL NEURAL NETS의 응용 예
 상황분석
시각인지 기능과 문장을 만들기 위한 RNN을 이용하
여
주어진 영상에 대한 설명을 수행
8
 물체 감지 및 인식
시각인지 기능을 이용하여 물체의 class와 BB 제시
CONVOLUTIONAL NEURAL NETS의 응용 예
다양한 Convolution layers
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
9
CONVOLUTIONAL NEURAL NETS의 응용 예
 의미론적 분활(Semantic Segmentation)
시각인지 기능을 이용하여 영상의 픽셀단위로 라벨링 작업
10
CLs FCLs A1
Action
Sequential Front view
 End to end learning for Self-driving car
시각인지와 자동차의 행동을 학습하여 자율주행을 수
행
https://youtu.be/qhUvQiKec2U
CONVOLUTIONAL NEURAL NETS의 응용 예
11
90x1
224x224 pixels
 Smart picking robot based on Deep learning
시각인지와 강화학습을 통한 산업용 로봇 훈련
CONVOLUTIONAL NEURAL NETS의 응용 예
12
Feature Extraction Layer Classification Layer
CNN 은 Feature Extraction과 Classification Layer로 구성
CONVOLUTIONAL NEURAL NETS의 구조
13
CONVOLUTIONAL NEURAL NETS의 동작원리
14
x_image
(28x28)
convolution
(5x5,s=1)
h_conv1
(28x28x32)
32 features
h_pool1
(14x14x32)
32 channels
Max pooling
(2x2,s=2)
h_conv2
(14x14x64)
64 features
convolution
(5x5,s=1)
64 features
h_pool2
(7x7x64)
Max pooling
(2x2,s=2)
1st convolutional layer 2nd convolutional layer
Reshape
7 * 7 * 64 Tensor  3,136x1 vector
.
.
.
1,024 neurons 10 digits
Fully connected layer
Networks Architecture
A
A
Readout layer
CONVOLUTION
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
1 0 1
0 1 0
1 0 1
4 3 4
2 4 3
2 3 4
=
convolution
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1
0 0 1 1 0
0 1 1 0 0
1 0 1
0 1 0
1 0 1
4
=convolution
filter feature map
Input or feature map
filter feature map
Input or feature map
 Convolution 연산 : 같은 위치에 있는 숫자끼리 곱한 후 모두 더함
 1x1 + 1x0 + 1x1 + 0x0 + 1x1 + 1x0 + 0x1 + 0x0 + 1x1 = 4
 Filter가 옆으로 이동 후 같은 연산 수행
 옆으로 모두 이동한 이후에는 아래로 이동 후 같은 연산 수행
CONVOLUTIONAL NEURAL NETS의 동작원리
16
CONVOLUTIONAL NEURAL NETS의 동작원리
RELU(WHY RELU ISTEAD OF
SIGMOID ?)
3 0 1
-2 0 2
0 2 3
1 -1 1
0 -1 -1
3 1 0
𝑓
3 0 1
0 0 2
0 2 3
𝑓
1 0 1
0 0 0
3 1 0
ReLU
ReL
U
CONVOLUTIONAL NEURAL NETS의 동작원리
Rectified Linear Unit
(ReLU)
POOLING LAYER
 Max pooling을 많이 사용함
CONVOLUTIONAL NEURAL NETS의 동작원리
2X2 MAX POOLING WITH STRIDE=1
3 0 1
0 0 2
0 2 3
1 0 1
0 0 0
3 1 0
3 2
2 3
1 1
3 1
max pooling
max pooling
CONVOLUTIONAL NEURAL NETS의 동작원리
20
 Dimension Reduction
 Add Spatial(Translation & Rotation) Invariance to
Feature Maps
• Be able to recognize feature regardless of angle, direction
or skew.
• Does not care where feature is, as long as it maintains its
relative position to other features.
CONVOLUTIONAL NEURAL NETS의 동작원리
Why Pooling ?
Spatial Invariance
Input Image
Convolution
(Learned)
Non-linearity
Spatial pooling
Feature maps
Input Feature Map
.
.
.
Key operations in a CNN
Source: R. Fergus, Y. LeCun Slide: Lazebnik
Input Image
Convolution
(Learned)
Non-linearity
Spatial pooling
Feature maps
Key operations
Source: R. Fergus, Y. LeCun
Rectified Linear Unit (ReLU)
Slide: Lazebnik
Input Image
Convolution
(Learned)
Non-linearity
Spatial pooling
Feature maps
Max
Key operations
Source: R. Fergus, Y. LeCun Slide: Lazebnik
Flattening takes the pooled layer and flattens it in sequential order into
a single vector.
• Vector is used as the input to the Classifier
Flattening
CONVOLUTIONAL NEURAL NETS의 동작원리
FULLY-CONNECTED LAYER
3 2
2 3
1 1
3 1
3
2
2
3
1
1
3
1
2
1
softmax
0.8
0.2
Cat
Dog
CONVOLUTIONAL NEURAL NETS의 동작원리
26
CONVOLUTIONAL NEURAL NETS의 진화
LeNet to ResNet: A Deep Journey
LeNet5 (1998): The origin of convolutional neural network
• Repeat of Convolution – Pooling – Non
Linearity
• Average pooling
• Sigmoid activation for the intermediate
layer
• tanh activation at F6
• 5x5 Convolutionfilter
• 7 layers and less than 1M parameters
• Use of convolution to extract
spatial features
• Subsample using spatial average
ofmaps
• Sparse connection matrix
between layers to avoid large
computationalcost
Characteristics Key Contributions
• Slow totrain
• Hard to train (Neuronsdies
quickly)
• Lack of data
The Gap
27
CONVOLUTIONAL NEURAL NETS의 진화
• ImageNet is an image database organized according to the WordNet hierarchy
• is formally a project aimed at (manually) labeling and categorizing images
• ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
• Training Data: 1.2 Million Images, 1000+ categories
• Validation and Test Data: 150K Images, 50K Validation, Remaining Test
• Image Net Data: http://image-net.org/challenges/LSVRC/2010/browse-synsets
• Multiple Challenges; Object recognition, localization etc.
IMAGENET CLASSIFICATION RESULTS
<2012 Result>
• Krizhevsky et al. – 16.4% error(top-5)
• Next best (non-convnet) – 26.2%
error
<2013 Result>
• All rankers use deep
learning(Convnet)
Revolution of Depth!
AlexNet
CONVOLUTIONAL NEURAL NETS의 진화
29
CONVOLUTIONAL NEURAL NETS의 진화
ALEXNET (2012)
• GPU and training in
parallel
• ReLu Activation
• Dropout regularization
• Image Augmentation
Characteristics Key Contributions
- 11x11, 5x5 and 3x3 Convolutions
- Max pooling
- 3 FC layers
- 60 Million parameters
30
CONVOLUTIONAL NEURAL NETS의 진화
A 4 layer CNN with ReLUs is 6
times faster than equivalent
network with thanh in
reaching 25% error rate on
CIFR-10 dataset
RELU NON-LINEARITY – SIMPLER ACTIVATION
Ljubljana, June 2016
Deep learning - ReLU
How does sigmoid function affect learning?
• Enables easier computation of derivative but has negative effects:
– Neuron never reaches 1 or 0  saturating
– Gradient reduces the magnitude of error
• Leads to two problems:
• Slow learning when neurons saturated i.e. big z values
• Vanishing gradient problem (gradient always 25% of error from previous layer!!)
Ljubljana, June 2016
Deep learning - ReLU
• Alex Krizhevsky (2011) proposed Rectified Linear Unit instead of sigmoid function
• Main purpose of ReLu: reduces saturation and vanishing gradient issues
• Still not perfect:
– Stops learning at negative z values (can use piecewise linear - Parametric ReLu, He 2015 from
Microsoft)
– Bigger risk of saturating neurons to infinity
Ljubljana, June 2016
Deep learning - dropout
• Too many weights cause overfitting issues
• Weight decay (regularization) helps but is not perfect
– Also adds another hyper-parameter to setup manually
• Srivastava et al. (2014) proposed a kind of „bagging“ for deep nets (actually Alex
Krizhevsky already used it in AlexNet in 2011)
• Main point:
– Robustify network by disabling neurons
– Each neuron has a probability, usually of 0.4, of being disabled
– Remaining neurons must adept to work without them
• Applied only to fully connected layers
– Conv. layers less susceptible to overfitting
Srivastava et al., Dropout : A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2014
Ljubljana, June 2016
Deep learning – batch norm
• Input needs to be whitened i.e. normalized (LeCun 1998,
Efficient BackProp)
– Usually done on first layer input only
• The same reason for normalization of first layer exists for
other layers as well
• Ioffe and Szegedy, Batch Normalization, 2015
– Normalize input to each layer
– Reduce internal covariance shift
– Too slow to normalize all input data (>1M samples)
– Instead normalize within mini-batch only
– Learning: norm over mini-batch data
– Inference: norm over all trained input data
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network
Training by Reducing Internal Covariate Shift, 2015
Better results while allowing to use higher learning rate, higher decay, no dropout, no LRN.
35
VGG (2014) • Smaller size convolution 3x3 throughout the net
• Sequence of 3x3 convolution can emulate
larger receptive fields, e.g., 5x5 or 7x7
• Use of 1x1 convolution
• Decrease in spatial volume and increase in
depth of input
What's the advantage of using 3 layers of
3x3 instead of one layer of 7x7?
• 3 non-linear rectification layers
• Less number of parameters, 27C2 as opposed to
49C2
Key Points
• Depth is important
• Simplify the network to go deep
• 140M parameters
(mostly due to the FC layers)
CONVOLUTIONAL NEURAL NETS의 진화
36
VGG(2ND PLACE IN 2014)
 3x3 filter만 반복해서 사용
 Why??
 Convolution filter를 stack하면 더 큰
receptive field를 가질 수 있음
 2개의 3x3 filter = 5x5 filter
 3개의 3x3 filter = 7x7 filter
 Parameter수는 큰 filter 사용하는 경우에
비하여 감소 
regularization 효과
“Very Deep Convolutional Networks for Large-Scale Image Recognition”
37
CONVOLUTIONAL NEURAL NETS의 진화
GOOGLENET OR INCEPTION (2014)
• 22 Layer CNN
• Heavy use of 1x1 ‘Network in Network’
• Use of average pooling before the classification
• Auxiliary classifiers connected to intermediate layers
• During training add the loss of the auxiliary classifiers
with a discount (0.3) weight
38
GOOGLENET KEY IDEAS
• 3x3 or 5x5 중 어떤 것이 좋은가 ?
• 전부 다 사용해보자
 연산량이 많아진다.
Naïve Version
Modified Idea
Way too many output!!! Use 1x1 for dimensionality reduction
Why 1x1 convolution?
• Introduced as “Network in Network” in 2014
• Is a way to increase Non-Linearity and spatially combine
features across feature maps
Only 4M parameters compared to
60M in AlexNet
39
GOOGLENET KEY IDEAS
 1x1 convolution을 사용하여 dimension reduction
 Feature map의 개수를 절반으로 줄여 총 연산량은 비슷하게
40
GOOGLENET KEY IDEAS
Input layer Kth feature map,
output layer
X11
Xij
y11,k
yij,kwk
wk
X11 : 1x256 vector, wk : 1x256 weight vector, Yij,k = f(Xij·wk), f() : Nonlinear ft’n
x y
w
1x1 Convolution의 dimension reduction 원리
Fully Connected NN을 이용한
Feature Dimension Reduction원리와 동일
41
RESNET (RESIDUAL NEURAL NETWORK) (2015)
CONVOLUTIONAL NEURAL NETS의 진화
• Introduce shortcut connections (exists in prior literature in various forms)
• Key invention is to skip 2 layers. Skipping single layer didn’t give much
improvement for some reason
42
RESNET
 Layer수가 많을수록 항상 좋을까?
 56개의 layer를 사용하는 경우가 20개의 layer를 사용하는 경우에 비
해 training error가 더 큰 결과가 나옴
 더 deep한 model은 training error가
더 낮아야 하지만
 Deep한 model은 optimization이 쉽지 않다
는 것을 발견(identity도 힘들다)
원인 : vanishing/exploding gradient
학습시켜야 할 파라메터 수의 증가
A shallower model
(18 layers)
A deeper model
(34 layers)
“Deep Residual Learning for Image Recognition”
RESNET
RESNET의 KEY IDEA
 Identity는 그대로 상위 layer로 전달하고, 나머지 부분만 학습
 H(x)를 얻는 것이 목표가 아니라 F(x)=H(x)-x 를 목표로
F(x) ~0 이므로 수렴이 빠름
 Identity shortcut을 통한 효과
- 깊은 망의 최적화도 가능
- 깊이에 비례해 정확도 개선
“Deep Residual Learning for Image Recognition”
 BOTTLENECK : A PRACTICAL DESIGN
• # parameters
• 256 x 64 + 64 x 3 x 3x 64 + 64 x 25
6 = ~70K
• # parameters just using 3 x 3 x 256 x 2
56 conv layer = ~600K
1x1 conv를 이용하여 dimension reduction  3x3 conv 
1x1 conv를 이용하여 dimension expansion
 연산량을 줄이기 위함
RESNET의 KEY IDEA
Dilated convolutions
The goal of this layer is to increase the size of the receptive field
(input activations that are used to compute a given output)
without using downsampling (in order to preserve local information).
Increasing the size of the receptive field allows to use more context
(information spatially further away).
The idea is to spread the input images and fill the added pixels with
zeros, and then compute a convolution.
Deeper the better!!!
DenseNet
2017 CVPR에서 Densely Connected Network라는 네트워크 구조에 획기적인 변화를
주는 연구 결과가 발표
Brief intro : Invariance and Equivariance
CovNet are translational Equivalent
This demonstrates LeNet-5's invariance to small rotations (+/-40 degrees).
How about Rotation ?
Limitation of Conventional CovNet
2D convolution is equivariant under translation, but not under rotation
Limitation of Conventional CovNet
Invariance
Φ
Image(X)
Feature(Z) Z1 = Z = Z2
𝑇𝑔
1
Mapping
ft’n(Φ(·))
Φ
Transformation
X1 X2
Z = Z1 = Φ(X1) = Z2 = Φ(X2) = Φ(𝑻 𝒈
𝟏
X1 )
: Mapping independent of transformation, 𝑇𝑔, for all 𝑇𝑔
X2 = 𝑇𝑔
1
X1
To make a Convolutional Neural Networks (CNN) transformation-
invariant, data augmentation with training samples is generally used
Invariance
Equivariance
Φ
Image(X)
Feature(Z) Z1 Z2
𝑇𝑔
2
𝑇𝑔
1
Φ
Transformation
X1 X2
Z2 = 𝑻 𝒈
𝟐
Z1 = 𝑻 𝒈
𝟐
Φ(X1) = Φ(𝑻 𝒈
𝟏
X1 )
: Invariance is special case of equivariance where 𝑇𝑔
2 is the identity.
X2 = 𝑇𝑔
1
X1
Z2 = 𝑇𝑔
2
Z1
: Mapping preserves algebraic structure of transformation
Z1 ≠ Z2 but keeps the relationship
Mapping
ft’n(Φ(·))
Equivariance : Group CovNet
To understand the rotation or proportion change of a given entity, a
group of filters(a combination of rotated and mirror reflected versions of
filter) is adopted.
For example, the group p4 which contains translations and rotations by
multiples of ninety degrees, or, which additionally contains mirror
reflections.
: Rotation
: Mirror reflections
A filter in a G-CNN detects co-occurrences of features that have the
preferred relative pose, and can match such a feature constellation in
every global pose through an operation called the G-convolution.
Equivariance : Group CovNet
Filter group 1
Filter group 2
Filter group N
Visualization of classic 2D convolution
Visualization of the G-Conv for the roto-translation group
G-Convolution
Equivariance : Group CovNet
G-convolution is equivariant under rotation
G-Convolution
Equivariance : Group CovNet
Equivariance : Group CovNet
Latent representations learnt by a CNN and a G-CNN.
- The left part is the result of a typical CNN while the right one is that of a G-
CNN.
- In both parts, the outer cycles consist of the rotated images while the inner
cycles consist of the learnt representations.
- Features produced by a G-CNN is equivariant to rotation while that produced
by a typical CNN is not.
What we need : EQUIVARIANCE (not invariance)
“Equivariance makes a CNN understand the rotation or proportion change”
Equivariance : Capsule Net
“A capsule is a group of neurons whose activity vector represents
the instantiation parameters of a specific type of entity such as an
object or an object part.”
Equivariance : Capsule Net
Equivariance of Capsules
“A capsule is a group of neurons whose activity vector represents the
instantiation parameters of a specific type of entity such as an object or
an object part.”
Activity vector map Object
Equivariance : Capsule Net

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
그래픽 최적화로 가...가버렷! (부제: 배치! 배칭을 보자!) , Batch! Let's take a look at Batching! -...
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
 
VQ-VAE
VQ-VAEVQ-VAE
VQ-VAE
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
論文要約:AUGMIX: A SIMPLE DATA PROCESSING METHOD TO IMPROVE ROBUSTNESS AND UNCERT...
論文要約:AUGMIX: A SIMPLE DATA PROCESSING METHOD TO IMPROVE ROBUSTNESS AND UNCERT...論文要約:AUGMIX: A SIMPLE DATA PROCESSING METHOD TO IMPROVE ROBUSTNESS AND UNCERT...
論文要約:AUGMIX: A SIMPLE DATA PROCESSING METHOD TO IMPROVE ROBUSTNESS AND UNCERT...
 
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the Wild
 
Chapter 9 - convolutional networks
Chapter 9 - convolutional networksChapter 9 - convolutional networks
Chapter 9 - convolutional networks
 
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
[DL輪読会]GANとエネルギーベースモデル
[DL輪読会]GANとエネルギーベースモデル[DL輪読会]GANとエネルギーベースモデル
[DL輪読会]GANとエネルギーベースモデル
 
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
 
動作認識におけるディープラーニングの最新動向1 3D-CNN
動作認識におけるディープラーニングの最新動向1 3D-CNN動作認識におけるディープラーニングの最新動向1 3D-CNN
動作認識におけるディープラーニングの最新動向1 3D-CNN
 
SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)SSD: Single Shot MultiBox Detector (ECCV2016)
SSD: Single Shot MultiBox Detector (ECCV2016)
 
[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger[DL輪読会]YOLO9000: Better, Faster, Stronger
[DL輪読会]YOLO9000: Better, Faster, Stronger
 
Customize renderpipeline
Customize renderpipelineCustomize renderpipeline
Customize renderpipeline
 
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
 

Ähnlich wie Convolutional neural networks 이론과 응용

intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 

Ähnlich wie Convolutional neural networks 이론과 응용 (20)

Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptx
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 
Development of Deep Learning Architecture
Development of Deep Learning ArchitectureDevelopment of Deep Learning Architecture
Development of Deep Learning Architecture
 
convnets.pptx
convnets.pptxconvnets.pptx
convnets.pptx
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Image Classification using deep learning
Image Classification using deep learning Image Classification using deep learning
Image Classification using deep learning
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 

Mehr von 홍배 김

Mehr von 홍배 김 (20)

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Lecture Summary : Camera Projection
Lecture Summary : Camera Projection Lecture Summary : Camera Projection
Lecture Summary : Camera Projection
 
Learning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robotsLearning agile and dynamic motor skills for legged robots
Learning agile and dynamic motor skills for legged robots
 
Robotics of Quadruped Robot
Robotics of Quadruped RobotRobotics of Quadruped Robot
Robotics of Quadruped Robot
 
Basics of Robotics
Basics of RoboticsBasics of Robotics
Basics of Robotics
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Anomaly Detection with GANs
Anomaly Detection with GANsAnomaly Detection with GANs
Anomaly Detection with GANs
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
 
Convolution 종류 설명
Convolution 종류 설명Convolution 종류 설명
Convolution 종류 설명
 
Learning by association
Learning by associationLearning by association
Learning by association
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
 
Binarized CNN on FPGA
Binarized CNN on FPGABinarized CNN on FPGA
Binarized CNN on FPGA
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Convolutional neural networks 이론과 응용

  • 1. CONVOLUTIONAL NEURAL NETS 소개 2019. 3. 김 홍 배1
  • 2. 목차  Convolutional Neural Nets ?  Convolutional Neural Nets의 응용예  Convolutional Neural Nets의 동작원리  Convolutional Neural Nets의 진화과정  Brief intro : Invariance and Equivariance  Limitations of CNN  Group CovNet  Capsule Net 2
  • 3. 3 x_image (28x28) Reshape 28x28  784x1 vector . . . 10 digits . . W, bx y=softmax(Wx+b) Neural Nets # of unknown parameters to estimate = # of weights + # of bias = 784x10+10 = 7,850 !!! • 일반적인 Neural Net의 경우, 입력 이미지의 pixel 정보로 부터 시작 • 고해상도 이미지를 고속으로 처리가 불가능 CONVOLUTIONAL NEURAL NETS ?
  • 4. CONVOLUTIONAL NEURAL NETS ?  딥러닝 기반 시각인지를 위한 Networks 4 • CNN은 간단한 형상의 Patch(Filter or Kernel) 단위로 특징 추출 • 상위계층으로 진행될 수록 사물의 전체 형상을 구성  추정해야 할 parameter의 수가 줄어듬
  • 5. 5 • Color images are three dimensional and so have a volume • Time domain speech signals are 1-d while the frequency domain representations (e.g. MFCC vectors) take a 2d form. They can also be looked at as a time sequence. • Medical images (such as CT/MR/etc) are multi-dimensional • Videos have the additional temporal dimension compared to stationary images • Variable length sequences and time series data are again multi-dimensional • Hence it makes sense to model them as tensors instead of vectors. CONVOLUTIONAL NEURAL NETS ? Types of inputs
  • 6. 6 • Image retrieval from database • Object Detection • Self driving cars • Semantic segmentation • Face recognition (FB tagging) • Pose estimation • Detect diseases • Speech Recognition • Text processing • Analysing satellite data CONVOLUTIONAL NEURAL NETS의 응용 예 CNNs are everywhere
  • 7. 7 CONVOLUTIONAL NEURAL NETS의 응용 예  상황분석 시각인지 기능과 문장을 만들기 위한 RNN을 이용하 여 주어진 영상에 대한 설명을 수행
  • 8. 8  물체 감지 및 인식 시각인지 기능을 이용하여 물체의 class와 BB 제시 CONVOLUTIONAL NEURAL NETS의 응용 예 다양한 Convolution layers Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
  • 9. 9 CONVOLUTIONAL NEURAL NETS의 응용 예  의미론적 분활(Semantic Segmentation) 시각인지 기능을 이용하여 영상의 픽셀단위로 라벨링 작업
  • 10. 10 CLs FCLs A1 Action Sequential Front view  End to end learning for Self-driving car 시각인지와 자동차의 행동을 학습하여 자율주행을 수 행 https://youtu.be/qhUvQiKec2U CONVOLUTIONAL NEURAL NETS의 응용 예
  • 11. 11 90x1 224x224 pixels  Smart picking robot based on Deep learning 시각인지와 강화학습을 통한 산업용 로봇 훈련 CONVOLUTIONAL NEURAL NETS의 응용 예
  • 12. 12 Feature Extraction Layer Classification Layer CNN 은 Feature Extraction과 Classification Layer로 구성 CONVOLUTIONAL NEURAL NETS의 구조
  • 14. 14 x_image (28x28) convolution (5x5,s=1) h_conv1 (28x28x32) 32 features h_pool1 (14x14x32) 32 channels Max pooling (2x2,s=2) h_conv2 (14x14x64) 64 features convolution (5x5,s=1) 64 features h_pool2 (7x7x64) Max pooling (2x2,s=2) 1st convolutional layer 2nd convolutional layer Reshape 7 * 7 * 64 Tensor  3,136x1 vector . . . 1,024 neurons 10 digits Fully connected layer Networks Architecture A A Readout layer
  • 15. CONVOLUTION 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 4 3 4 2 4 3 2 3 4 = convolution 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 4 =convolution filter feature map Input or feature map filter feature map Input or feature map  Convolution 연산 : 같은 위치에 있는 숫자끼리 곱한 후 모두 더함  1x1 + 1x0 + 1x1 + 0x0 + 1x1 + 1x0 + 0x1 + 0x0 + 1x1 = 4  Filter가 옆으로 이동 후 같은 연산 수행  옆으로 모두 이동한 이후에는 아래로 이동 후 같은 연산 수행 CONVOLUTIONAL NEURAL NETS의 동작원리
  • 17. RELU(WHY RELU ISTEAD OF SIGMOID ?) 3 0 1 -2 0 2 0 2 3 1 -1 1 0 -1 -1 3 1 0 𝑓 3 0 1 0 0 2 0 2 3 𝑓 1 0 1 0 0 0 3 1 0 ReLU ReL U CONVOLUTIONAL NEURAL NETS의 동작원리 Rectified Linear Unit (ReLU)
  • 18. POOLING LAYER  Max pooling을 많이 사용함 CONVOLUTIONAL NEURAL NETS의 동작원리
  • 19. 2X2 MAX POOLING WITH STRIDE=1 3 0 1 0 0 2 0 2 3 1 0 1 0 0 0 3 1 0 3 2 2 3 1 1 3 1 max pooling max pooling CONVOLUTIONAL NEURAL NETS의 동작원리
  • 20. 20  Dimension Reduction  Add Spatial(Translation & Rotation) Invariance to Feature Maps • Be able to recognize feature regardless of angle, direction or skew. • Does not care where feature is, as long as it maintains its relative position to other features. CONVOLUTIONAL NEURAL NETS의 동작원리 Why Pooling ? Spatial Invariance
  • 21. Input Image Convolution (Learned) Non-linearity Spatial pooling Feature maps Input Feature Map . . . Key operations in a CNN Source: R. Fergus, Y. LeCun Slide: Lazebnik
  • 22. Input Image Convolution (Learned) Non-linearity Spatial pooling Feature maps Key operations Source: R. Fergus, Y. LeCun Rectified Linear Unit (ReLU) Slide: Lazebnik
  • 23. Input Image Convolution (Learned) Non-linearity Spatial pooling Feature maps Max Key operations Source: R. Fergus, Y. LeCun Slide: Lazebnik
  • 24. Flattening takes the pooled layer and flattens it in sequential order into a single vector. • Vector is used as the input to the Classifier Flattening CONVOLUTIONAL NEURAL NETS의 동작원리
  • 25. FULLY-CONNECTED LAYER 3 2 2 3 1 1 3 1 3 2 2 3 1 1 3 1 2 1 softmax 0.8 0.2 Cat Dog CONVOLUTIONAL NEURAL NETS의 동작원리
  • 26. 26 CONVOLUTIONAL NEURAL NETS의 진화 LeNet to ResNet: A Deep Journey LeNet5 (1998): The origin of convolutional neural network • Repeat of Convolution – Pooling – Non Linearity • Average pooling • Sigmoid activation for the intermediate layer • tanh activation at F6 • 5x5 Convolutionfilter • 7 layers and less than 1M parameters • Use of convolution to extract spatial features • Subsample using spatial average ofmaps • Sparse connection matrix between layers to avoid large computationalcost Characteristics Key Contributions • Slow totrain • Hard to train (Neuronsdies quickly) • Lack of data The Gap
  • 27. 27 CONVOLUTIONAL NEURAL NETS의 진화 • ImageNet is an image database organized according to the WordNet hierarchy • is formally a project aimed at (manually) labeling and categorizing images • ImageNet Large Scale Visual Recognition Challenge (ILSVRC) • Training Data: 1.2 Million Images, 1000+ categories • Validation and Test Data: 150K Images, 50K Validation, Remaining Test • Image Net Data: http://image-net.org/challenges/LSVRC/2010/browse-synsets • Multiple Challenges; Object recognition, localization etc.
  • 28. IMAGENET CLASSIFICATION RESULTS <2012 Result> • Krizhevsky et al. – 16.4% error(top-5) • Next best (non-convnet) – 26.2% error <2013 Result> • All rankers use deep learning(Convnet) Revolution of Depth! AlexNet CONVOLUTIONAL NEURAL NETS의 진화
  • 29. 29 CONVOLUTIONAL NEURAL NETS의 진화 ALEXNET (2012) • GPU and training in parallel • ReLu Activation • Dropout regularization • Image Augmentation Characteristics Key Contributions - 11x11, 5x5 and 3x3 Convolutions - Max pooling - 3 FC layers - 60 Million parameters
  • 30. 30 CONVOLUTIONAL NEURAL NETS의 진화 A 4 layer CNN with ReLUs is 6 times faster than equivalent network with thanh in reaching 25% error rate on CIFR-10 dataset RELU NON-LINEARITY – SIMPLER ACTIVATION
  • 31. Ljubljana, June 2016 Deep learning - ReLU How does sigmoid function affect learning? • Enables easier computation of derivative but has negative effects: – Neuron never reaches 1 or 0  saturating – Gradient reduces the magnitude of error • Leads to two problems: • Slow learning when neurons saturated i.e. big z values • Vanishing gradient problem (gradient always 25% of error from previous layer!!)
  • 32. Ljubljana, June 2016 Deep learning - ReLU • Alex Krizhevsky (2011) proposed Rectified Linear Unit instead of sigmoid function • Main purpose of ReLu: reduces saturation and vanishing gradient issues • Still not perfect: – Stops learning at negative z values (can use piecewise linear - Parametric ReLu, He 2015 from Microsoft) – Bigger risk of saturating neurons to infinity
  • 33. Ljubljana, June 2016 Deep learning - dropout • Too many weights cause overfitting issues • Weight decay (regularization) helps but is not perfect – Also adds another hyper-parameter to setup manually • Srivastava et al. (2014) proposed a kind of „bagging“ for deep nets (actually Alex Krizhevsky already used it in AlexNet in 2011) • Main point: – Robustify network by disabling neurons – Each neuron has a probability, usually of 0.4, of being disabled – Remaining neurons must adept to work without them • Applied only to fully connected layers – Conv. layers less susceptible to overfitting Srivastava et al., Dropout : A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2014
  • 34. Ljubljana, June 2016 Deep learning – batch norm • Input needs to be whitened i.e. normalized (LeCun 1998, Efficient BackProp) – Usually done on first layer input only • The same reason for normalization of first layer exists for other layers as well • Ioffe and Szegedy, Batch Normalization, 2015 – Normalize input to each layer – Reduce internal covariance shift – Too slow to normalize all input data (>1M samples) – Instead normalize within mini-batch only – Learning: norm over mini-batch data – Inference: norm over all trained input data Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015 Better results while allowing to use higher learning rate, higher decay, no dropout, no LRN.
  • 35. 35 VGG (2014) • Smaller size convolution 3x3 throughout the net • Sequence of 3x3 convolution can emulate larger receptive fields, e.g., 5x5 or 7x7 • Use of 1x1 convolution • Decrease in spatial volume and increase in depth of input What's the advantage of using 3 layers of 3x3 instead of one layer of 7x7? • 3 non-linear rectification layers • Less number of parameters, 27C2 as opposed to 49C2 Key Points • Depth is important • Simplify the network to go deep • 140M parameters (mostly due to the FC layers) CONVOLUTIONAL NEURAL NETS의 진화
  • 36. 36 VGG(2ND PLACE IN 2014)  3x3 filter만 반복해서 사용  Why??  Convolution filter를 stack하면 더 큰 receptive field를 가질 수 있음  2개의 3x3 filter = 5x5 filter  3개의 3x3 filter = 7x7 filter  Parameter수는 큰 filter 사용하는 경우에 비하여 감소  regularization 효과 “Very Deep Convolutional Networks for Large-Scale Image Recognition”
  • 37. 37 CONVOLUTIONAL NEURAL NETS의 진화 GOOGLENET OR INCEPTION (2014) • 22 Layer CNN • Heavy use of 1x1 ‘Network in Network’ • Use of average pooling before the classification • Auxiliary classifiers connected to intermediate layers • During training add the loss of the auxiliary classifiers with a discount (0.3) weight
  • 38. 38 GOOGLENET KEY IDEAS • 3x3 or 5x5 중 어떤 것이 좋은가 ? • 전부 다 사용해보자  연산량이 많아진다. Naïve Version Modified Idea Way too many output!!! Use 1x1 for dimensionality reduction Why 1x1 convolution? • Introduced as “Network in Network” in 2014 • Is a way to increase Non-Linearity and spatially combine features across feature maps Only 4M parameters compared to 60M in AlexNet
  • 39. 39 GOOGLENET KEY IDEAS  1x1 convolution을 사용하여 dimension reduction  Feature map의 개수를 절반으로 줄여 총 연산량은 비슷하게
  • 40. 40 GOOGLENET KEY IDEAS Input layer Kth feature map, output layer X11 Xij y11,k yij,kwk wk X11 : 1x256 vector, wk : 1x256 weight vector, Yij,k = f(Xij·wk), f() : Nonlinear ft’n x y w 1x1 Convolution의 dimension reduction 원리 Fully Connected NN을 이용한 Feature Dimension Reduction원리와 동일
  • 41. 41 RESNET (RESIDUAL NEURAL NETWORK) (2015) CONVOLUTIONAL NEURAL NETS의 진화 • Introduce shortcut connections (exists in prior literature in various forms) • Key invention is to skip 2 layers. Skipping single layer didn’t give much improvement for some reason
  • 42. 42 RESNET  Layer수가 많을수록 항상 좋을까?  56개의 layer를 사용하는 경우가 20개의 layer를 사용하는 경우에 비 해 training error가 더 큰 결과가 나옴
  • 43.  더 deep한 model은 training error가 더 낮아야 하지만  Deep한 model은 optimization이 쉽지 않다 는 것을 발견(identity도 힘들다) 원인 : vanishing/exploding gradient 학습시켜야 할 파라메터 수의 증가 A shallower model (18 layers) A deeper model (34 layers) “Deep Residual Learning for Image Recognition” RESNET
  • 44. RESNET의 KEY IDEA  Identity는 그대로 상위 layer로 전달하고, 나머지 부분만 학습  H(x)를 얻는 것이 목표가 아니라 F(x)=H(x)-x 를 목표로 F(x) ~0 이므로 수렴이 빠름  Identity shortcut을 통한 효과 - 깊은 망의 최적화도 가능 - 깊이에 비례해 정확도 개선 “Deep Residual Learning for Image Recognition”
  • 45.  BOTTLENECK : A PRACTICAL DESIGN • # parameters • 256 x 64 + 64 x 3 x 3x 64 + 64 x 25 6 = ~70K • # parameters just using 3 x 3 x 256 x 2 56 conv layer = ~600K 1x1 conv를 이용하여 dimension reduction  3x3 conv  1x1 conv를 이용하여 dimension expansion  연산량을 줄이기 위함 RESNET의 KEY IDEA
  • 46. Dilated convolutions The goal of this layer is to increase the size of the receptive field (input activations that are used to compute a given output) without using downsampling (in order to preserve local information). Increasing the size of the receptive field allows to use more context (information spatially further away). The idea is to spread the input images and fill the added pixels with zeros, and then compute a convolution.
  • 48. DenseNet 2017 CVPR에서 Densely Connected Network라는 네트워크 구조에 획기적인 변화를 주는 연구 결과가 발표
  • 49.
  • 50. Brief intro : Invariance and Equivariance
  • 51. CovNet are translational Equivalent This demonstrates LeNet-5's invariance to small rotations (+/-40 degrees). How about Rotation ? Limitation of Conventional CovNet
  • 52. 2D convolution is equivariant under translation, but not under rotation Limitation of Conventional CovNet
  • 53. Invariance Φ Image(X) Feature(Z) Z1 = Z = Z2 𝑇𝑔 1 Mapping ft’n(Φ(·)) Φ Transformation X1 X2 Z = Z1 = Φ(X1) = Z2 = Φ(X2) = Φ(𝑻 𝒈 𝟏 X1 ) : Mapping independent of transformation, 𝑇𝑔, for all 𝑇𝑔 X2 = 𝑇𝑔 1 X1
  • 54. To make a Convolutional Neural Networks (CNN) transformation- invariant, data augmentation with training samples is generally used Invariance
  • 55. Equivariance Φ Image(X) Feature(Z) Z1 Z2 𝑇𝑔 2 𝑇𝑔 1 Φ Transformation X1 X2 Z2 = 𝑻 𝒈 𝟐 Z1 = 𝑻 𝒈 𝟐 Φ(X1) = Φ(𝑻 𝒈 𝟏 X1 ) : Invariance is special case of equivariance where 𝑇𝑔 2 is the identity. X2 = 𝑇𝑔 1 X1 Z2 = 𝑇𝑔 2 Z1 : Mapping preserves algebraic structure of transformation Z1 ≠ Z2 but keeps the relationship Mapping ft’n(Φ(·))
  • 56. Equivariance : Group CovNet To understand the rotation or proportion change of a given entity, a group of filters(a combination of rotated and mirror reflected versions of filter) is adopted. For example, the group p4 which contains translations and rotations by multiples of ninety degrees, or, which additionally contains mirror reflections. : Rotation : Mirror reflections
  • 57. A filter in a G-CNN detects co-occurrences of features that have the preferred relative pose, and can match such a feature constellation in every global pose through an operation called the G-convolution. Equivariance : Group CovNet Filter group 1 Filter group 2 Filter group N
  • 58. Visualization of classic 2D convolution Visualization of the G-Conv for the roto-translation group G-Convolution Equivariance : Group CovNet
  • 59. G-convolution is equivariant under rotation G-Convolution Equivariance : Group CovNet
  • 60. Equivariance : Group CovNet Latent representations learnt by a CNN and a G-CNN. - The left part is the result of a typical CNN while the right one is that of a G- CNN. - In both parts, the outer cycles consist of the rotated images while the inner cycles consist of the learnt representations. - Features produced by a G-CNN is equivariant to rotation while that produced by a typical CNN is not.
  • 61. What we need : EQUIVARIANCE (not invariance) “Equivariance makes a CNN understand the rotation or proportion change” Equivariance : Capsule Net
  • 62. “A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.” Equivariance : Capsule Net
  • 63. Equivariance of Capsules “A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.” Activity vector map Object Equivariance : Capsule Net