SlideShare a Scribd company logo
1 of 30
Download to read offline
A Brief Introduction
to Recent Segmentation Methods
Shunta Saito
Researcher at Preferred Networks, Inc.
Semantic Segmentation?
• Classifying all pixels, so it’s also called “Pixel-labeling”
* Feature Selection and Learning for Semantic Segmentation (Caner
Hazirbas), Master's thesis, Technical University Munich, 2014.
C C
C C
B
B
B
C C
B
B
B
B B
SS
S S S S S S S
S S S
R
R
R
R R
R R
R R
• Tackling this problem with CNN, usually it’s formulated as:
Typical formulation
Image CNN Prediction Label
Cross entropy
• The loss is calculated for each pixel independently
• It leads to the problem: “How can the model consider the context to make a single
prediction for a pixel?”
• How to leverage context information
• How to use law-level features in upper layers to make detailed
predictions
• How to create dense prediction
Common problems
“Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long and Evan Shelhamer et al. appeared in
arxiv on Nov. 14, 2014
Fully Convolutional Network
(1: Reinterpret classification as a coarse prediction)
• The fully connected layers in
classification network can be
viewed as convolutions with
kernels that cover their entire input
regions
• The spatial output maps of
these convolutionalized
models make them a natural
choice for dense problems
like semantic segmentation.
See a Caffe’s example “Net Surgery”: https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb
256-6x6
4096-1x1
4096-1x1
If input is 451x451, output is 8x8 of 1000ch
Fully Convolutional Network (2: Coarse to dense)
• 1 possible way:

“Shift-and-Stitch” trick proposed in OverFeat paper (21 Dec, 2013)

OverFeat is the winner at the localization task of ILSVRC 2013 (not detection)
Shift input and stitch
(=“interlace”) the outputs
プログレッシブとインターレース(改訂版)by 義裕⼤大⾒見見
https://www.youtube.com/watch?v=h785loU4bh4
Fully Convolutional Network (2: Coarse to dense)
To understand OverFeat’s “Shift-and-Stitch” trick
Fully Convolutional Network (2: Coarse to dense)
• Another way: Decreasing subsampling layer (like Max Pooling)
‣ It has tradeoff:
‣ The filters see finer information, but have smaller receptive fields
and take longer to compute (due to large feature maps)
movies and images are from: http://cs231n.github.io/convolutional-networks/
Fully Convolutional Network (2: Coarse to dense)
• Instead of all ways listed
above, finally, they employed
upsampling to make coarse
predictions denser
• In a sense, upsampling with
factor f is convolution with a
fractional input stride of 1/f
• So, a natural way to upsample
is therefore backwards
convolution (sometimes
called deconvolution) with an
output stride of f
Upsampling by deconvolution 74 74
'k '
x ,
.kz
ftp.YE?D ,
"
iEIII÷IiiIEE÷:#
in
,÷÷:±÷ei:#
a-
Output
stride .f
74k ,
l⇒I*.IE?IItiiIe#eEiidYEEEE.*ai
.
Deconvolution
Fully Convolutional Network
(3: Patch-wise training or whole image training)
• Whole image fully convolutional training is identical
to patchwise training where each batch consists of
all the receptive fields of the units below the loss for
an image
yajif
-
ED
yan⇒Ise###
Patchwise training
is loss sampling
• We performed spatial sampling of the loss by making an
independent choice to ignore each final layer cell with some
probability 1 − p
• To avoid changing the effec- tive batch size, we
simultaneously increase the number of images per batch by
a factor 1/p
Fully Convolutional Network (4: Skip connection)
Nov. 14, 2014
• Fuse coarse, semantic and local,
appearance information
: Deconvolution

(initialized as bilinear upsampling, and learned)
Added
: Bilinear upsampling (fixed)
Fully Convolutional Network (5: Training scheme)
1. Prepare a trained model for ILSVRC12 (1000-class image classification)
2. Discard the final classifier layer
3. Convolutionalizing all remaining fully-connected layers
4. Append a 1x1 convolution with the target class number of channels
• MomentumSGD (momentum: 0.9)
• batchsize: 20
• Fixed LR: 10^-4 for FCN-VGG-16 (Doubled LR for biases)
• Weight decay: 5^-4
• Zero-initialize the class scoring layer
• Fine-tuning was for all layers
Other training settings:
Fully Convolutional Network
1. Replacing all fully-connected layers with convolutions
2. Upsampling by backwards convolution, a.k.a. deconvolution (and
bilinear upsampling)
3. Applied skip connections to use local, appearance information in the
final layer
Summary
• “Learning Deconvolution Network for Semantic Segmentation”, Hyeonwoo Noh, et al. appeared in
arxiv on May. 17, 2015
Deconvolution Network
* “Unpooling” here is corresponding to Chainer’s “Upsampling2D” function
• Fully Convolutional Network (FCN) has limitations:
• Fixed-size receptive field yields inconsistent labels for large objects
➡ Skip connection can’t solve this because there is inherent trade-off between boundary
details and semantics
• Interpolating 16 x 16 output to the original input size makes blurred results
➡ The absence of a deep deconvolution network trained on a large dataset makes
it difficult to reconstruct highly non- linear structures of object boundaries accurately.
Deconvolution Network
Let’s do it to perform proper upsampling
Deconvolution Network
Feature extractor Shape generator
Deconvolution Network
Shape generator
14 × 14
deconvolutional
layer
28 × 28
unpooling layer
28 × 28
deconvolutional
layer
56 × 56
unpooling layer
56 × 56
deconvolutional
layer
112 × 112
unpooling layer
112 × 112
deconvolutional
layer
224 × 224
unpooling layer
224 × 224
deconvolutional
layer
Activations of each layer
Deconvolution Network
One can think: “Skip connection is missing…?”
U-Net
“U-Net: Convolutional Networks for Biomedical Image Segmentation”, Olaf Ronneberger,
Philipp Fischer, Thomas Brox, 18 May 2015
U-Net
SegNet
• “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image
Segmentation”, Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, 2 Nov,
2015
SegNet
• The training procedure is a bit complicated
• Encoder-decorder “pair-wise” training
There’s a Chainer implementation:

pfnet-research/chainer-segnet:
https://github.com/pfnet-research/chainer-segnet
Dilated convolutions
• “Multi-Scale Context Aggregation by Dilated Convolutions”, Fisher Yu, Vladlen Koltun, 23 Nov, 2015
• a.k.a stroud convolution, convolution with holes
• Enlarge the size of receptive field without losing resolution
The figure is from “WaveNet: A Generative Model for Raw Audio”
Dilated convolutions
• For example, the feature maps of ResNet are
downsampled 5 times, and 4 times in the 5 are done by
convolutions with stride of 2 (only the first one is by
pooling with stride of 2)
1/4
1/8
1/16
1/32
1/2
Dilated convolutions
• By using dilated convolutions instead of vanilla
convolutions, the resolution after the first pooling can be
kept as the same to the end
1/4
1/8
1/8
1/8
1/2
Dilated convolutions
But, it is still 1/8…
1/4
1/8
1/8
1/8
1/2
RefineNet
• “RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic
Segmentation”, Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid, 20 Nov. 2016
RefineNet
• Each intermediate feature map is refined through “RefineNet module”
RefineNet
* Implementation has been done in Chainer, the codes will be public soon
Semantic Segmentation using Adversarial Networks
• “Semantic Segmentation using Adversarial Networks”, Pauline Luc, Camille Couprie,
Soumith Chintala, Jakob Verbeek, 25 Nov. 2016

More Related Content

What's hot

近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision TransformerYusuke Uchida
 
(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展Takumi Ohkuma
 
【DL輪読会】"A Generalist Agent"
【DL輪読会】"A Generalist Agent"【DL輪読会】"A Generalist Agent"
【DL輪読会】"A Generalist Agent"Deep Learning JP
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)Deep Learning JP
 
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked AutoencodersDeep Learning JP
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksDeep Learning JP
 
深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定Masaaki Imaizumi
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイcvpaper. challenge
 
SfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法についてSfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法についてRyutaro Yamauchi
 
CVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNetCVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNetTakuya Minagawa
 
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開Hironobu Fujiyoshi
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2harmonylab
 
【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT【論文紹介】U-GAT-IT
【論文紹介】U-GAT-ITmeownoisy
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーnlab_utokyo
 
Sift特徴量について
Sift特徴量についてSift特徴量について
Sift特徴量についてla_flance
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 
[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?Deep Learning JP
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? Deep Learning JP
 

What's hot (20)

近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
 
(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展(2022年3月版)深層学習によるImage Classificaitonの発展
(2022年3月版)深層学習によるImage Classificaitonの発展
 
【DL輪読会】"A Generalist Agent"
【DL輪読会】"A Generalist Agent"【DL輪読会】"A Generalist Agent"
【DL輪読会】"A Generalist Agent"
 
【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)【DL輪読会】Patches Are All You Need? (ConvMixer)
【DL輪読会】Patches Are All You Need? (ConvMixer)
 
Efficient Det
Efficient DetEfficient Det
Efficient Det
 
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
 
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
[DL輪読会]EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定深層学習による非滑らかな関数の推定
深層学習による非滑らかな関数の推定
 
画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ画像生成・生成モデル メタサーベイ
画像生成・生成モデル メタサーベイ
 
SfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法についてSfM Learner系単眼深度推定手法について
SfM Learner系単眼深度推定手法について
 
CVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNetCVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNet
 
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
MIRU2020長尾賞受賞論文解説:Attention Branch Networkの展開
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
 
【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
 
Sift特徴量について
Sift特徴量についてSift特徴量について
Sift特徴量について
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?
 
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks? 【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
【DL輪読会】How Much Can CLIP Benefit Vision-and-Language Tasks?
 

Viewers also liked

強化学習入門
強化学習入門強化学習入門
強化学習入門Shunta Saito
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksShunta Saito
 
20170211クレジットカード認識
20170211クレジットカード認識20170211クレジットカード認識
20170211クレジットカード認識Takuya Minagawa
 
実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 Preferred Networks
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imageryShunta Saito
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論Shunta Saito
 
動的輪郭モデル
動的輪郭モデル動的輪郭モデル
動的輪郭モデルArumaziro
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 
20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョン20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョンTakuya Minagawa
 
全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131Hangyo Masatsugu
 
強化学習その2
強化学習その2強化学習その2
強化学習その2nishio
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出Kai Sasaki
 
Crfと素性テンプレート
Crfと素性テンプレートCrfと素性テンプレート
Crfと素性テンプレートKei Uchiumi
 
全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習kwp_george
 
なにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfugなにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfugNatsutani Minoru
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするDaiki Shimada
 
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2Daiki Shimada
 

Viewers also liked (20)

Semantic segmentation
Semantic segmentationSemantic segmentation
Semantic segmentation
 
強化学習入門
強化学習入門強化学習入門
強化学習入門
 
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural NetworksDeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose: Human Pose Estimation via Deep Neural Networks
 
20170211クレジットカード認識
20170211クレジットカード認識20170211クレジットカード認識
20170211クレジットカード認識
 
LT@Chainer Meetup
LT@Chainer MeetupLT@Chainer Meetup
LT@Chainer Meetup
 
実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論
 
動的輪郭モデル
動的輪郭モデル動的輪郭モデル
動的輪郭モデル
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョン20160525はじめてのコンピュータビジョン
20160525はじめてのコンピュータビジョン
 
全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131全脳アーキテクチャ若手の会20170131
全脳アーキテクチャ若手の会20170131
 
強化学習その2
強化学習その2強化学習その2
強化学習その2
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
 
Crfと素性テンプレート
Crfと素性テンプレートCrfと素性テンプレート
Crfと素性テンプレート
 
全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習
 
なにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfugなにわTech20170218(tpu) tfug
なにわTech20170218(tpu) tfug
 
LDA入門
LDA入門LDA入門
LDA入門
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
 
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2
 

Similar to A brief introduction to recent segmentation methods

Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun YooJaeJun Yoo
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningssusere5ddd6
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks曾 子芸
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classificationCenk Bircanoğlu
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Yan Xu
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Alex Conway
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386
 

Similar to A brief introduction to recent segmentation methods (20)

Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo[PR12] Inception and Xception - Jaejun Yoo
[PR12] Inception and Xception - Jaejun Yoo
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx04 Deep CNN (Ch_01 to Ch_3).pptx
04 Deep CNN (Ch_01 to Ch_3).pptx
 

More from Shunta Saito

Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerShunta Saito
 
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...Shunta Saito
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusionShunta Saito
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningShunta Saito
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回Shunta Saito
 

More from Shunta Saito (7)

Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
[5 minutes LT] Brief Introduction to Recent Image Recognition Methods and Cha...
 
Building detection with decision fusion
Building detection with decision fusionBuilding detection with decision fusion
Building detection with decision fusion
 
Automatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learningAutomatic selection of object recognition methods using reinforcement learning
Automatic selection of object recognition methods using reinforcement learning
 
集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回集合知プログラミングゼミ第1回
集合知プログラミングゼミ第1回
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

A brief introduction to recent segmentation methods

  • 1. A Brief Introduction to Recent Segmentation Methods Shunta Saito Researcher at Preferred Networks, Inc.
  • 2. Semantic Segmentation? • Classifying all pixels, so it’s also called “Pixel-labeling” * Feature Selection and Learning for Semantic Segmentation (Caner Hazirbas), Master's thesis, Technical University Munich, 2014. C C C C B B B C C B B B B B SS S S S S S S S S S S R R R R R R R R R
  • 3. • Tackling this problem with CNN, usually it’s formulated as: Typical formulation Image CNN Prediction Label Cross entropy • The loss is calculated for each pixel independently • It leads to the problem: “How can the model consider the context to make a single prediction for a pixel?”
  • 4. • How to leverage context information • How to use law-level features in upper layers to make detailed predictions • How to create dense prediction Common problems
  • 5. “Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long and Evan Shelhamer et al. appeared in arxiv on Nov. 14, 2014 Fully Convolutional Network (1: Reinterpret classification as a coarse prediction) • The fully connected layers in classification network can be viewed as convolutions with kernels that cover their entire input regions • The spatial output maps of these convolutionalized models make them a natural choice for dense problems like semantic segmentation. See a Caffe’s example “Net Surgery”: https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb 256-6x6 4096-1x1 4096-1x1 If input is 451x451, output is 8x8 of 1000ch
  • 6. Fully Convolutional Network (2: Coarse to dense) • 1 possible way:
 “Shift-and-Stitch” trick proposed in OverFeat paper (21 Dec, 2013)
 OverFeat is the winner at the localization task of ILSVRC 2013 (not detection) Shift input and stitch (=“interlace”) the outputs
  • 8. Fully Convolutional Network (2: Coarse to dense) • Another way: Decreasing subsampling layer (like Max Pooling) ‣ It has tradeoff: ‣ The filters see finer information, but have smaller receptive fields and take longer to compute (due to large feature maps) movies and images are from: http://cs231n.github.io/convolutional-networks/
  • 9. Fully Convolutional Network (2: Coarse to dense) • Instead of all ways listed above, finally, they employed upsampling to make coarse predictions denser • In a sense, upsampling with factor f is convolution with a fractional input stride of 1/f • So, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of f Upsampling by deconvolution 74 74 'k ' x , .kz ftp.YE?D , " iEIII÷IiiIEE÷:# in ,÷÷:±÷ei:# a- Output stride .f 74k , l⇒I*.IE?IItiiIe#eEiidYEEEE.*ai . Deconvolution
  • 10. Fully Convolutional Network (3: Patch-wise training or whole image training) • Whole image fully convolutional training is identical to patchwise training where each batch consists of all the receptive fields of the units below the loss for an image yajif - ED yan⇒Ise### Patchwise training is loss sampling • We performed spatial sampling of the loss by making an independent choice to ignore each final layer cell with some probability 1 − p • To avoid changing the effec- tive batch size, we simultaneously increase the number of images per batch by a factor 1/p
  • 11. Fully Convolutional Network (4: Skip connection) Nov. 14, 2014 • Fuse coarse, semantic and local, appearance information : Deconvolution
 (initialized as bilinear upsampling, and learned) Added : Bilinear upsampling (fixed)
  • 12. Fully Convolutional Network (5: Training scheme) 1. Prepare a trained model for ILSVRC12 (1000-class image classification) 2. Discard the final classifier layer 3. Convolutionalizing all remaining fully-connected layers 4. Append a 1x1 convolution with the target class number of channels • MomentumSGD (momentum: 0.9) • batchsize: 20 • Fixed LR: 10^-4 for FCN-VGG-16 (Doubled LR for biases) • Weight decay: 5^-4 • Zero-initialize the class scoring layer • Fine-tuning was for all layers Other training settings:
  • 13. Fully Convolutional Network 1. Replacing all fully-connected layers with convolutions 2. Upsampling by backwards convolution, a.k.a. deconvolution (and bilinear upsampling) 3. Applied skip connections to use local, appearance information in the final layer Summary
  • 14. • “Learning Deconvolution Network for Semantic Segmentation”, Hyeonwoo Noh, et al. appeared in arxiv on May. 17, 2015 Deconvolution Network * “Unpooling” here is corresponding to Chainer’s “Upsampling2D” function
  • 15. • Fully Convolutional Network (FCN) has limitations: • Fixed-size receptive field yields inconsistent labels for large objects ➡ Skip connection can’t solve this because there is inherent trade-off between boundary details and semantics • Interpolating 16 x 16 output to the original input size makes blurred results ➡ The absence of a deep deconvolution network trained on a large dataset makes it difficult to reconstruct highly non- linear structures of object boundaries accurately. Deconvolution Network Let’s do it to perform proper upsampling
  • 17. Deconvolution Network Shape generator 14 × 14 deconvolutional layer 28 × 28 unpooling layer 28 × 28 deconvolutional layer 56 × 56 unpooling layer 56 × 56 deconvolutional layer 112 × 112 unpooling layer 112 × 112 deconvolutional layer 224 × 224 unpooling layer 224 × 224 deconvolutional layer Activations of each layer
  • 18. Deconvolution Network One can think: “Skip connection is missing…?”
  • 19. U-Net “U-Net: Convolutional Networks for Biomedical Image Segmentation”, Olaf Ronneberger, Philipp Fischer, Thomas Brox, 18 May 2015
  • 20. U-Net
  • 21. SegNet • “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”, Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, 2 Nov, 2015
  • 22. SegNet • The training procedure is a bit complicated • Encoder-decorder “pair-wise” training There’s a Chainer implementation:
 pfnet-research/chainer-segnet: https://github.com/pfnet-research/chainer-segnet
  • 23. Dilated convolutions • “Multi-Scale Context Aggregation by Dilated Convolutions”, Fisher Yu, Vladlen Koltun, 23 Nov, 2015 • a.k.a stroud convolution, convolution with holes • Enlarge the size of receptive field without losing resolution The figure is from “WaveNet: A Generative Model for Raw Audio”
  • 24. Dilated convolutions • For example, the feature maps of ResNet are downsampled 5 times, and 4 times in the 5 are done by convolutions with stride of 2 (only the first one is by pooling with stride of 2) 1/4 1/8 1/16 1/32 1/2
  • 25. Dilated convolutions • By using dilated convolutions instead of vanilla convolutions, the resolution after the first pooling can be kept as the same to the end 1/4 1/8 1/8 1/8 1/2
  • 26. Dilated convolutions But, it is still 1/8… 1/4 1/8 1/8 1/8 1/2
  • 27. RefineNet • “RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation”, Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid, 20 Nov. 2016
  • 28. RefineNet • Each intermediate feature map is refined through “RefineNet module”
  • 29. RefineNet * Implementation has been done in Chainer, the codes will be public soon
  • 30. Semantic Segmentation using Adversarial Networks • “Semantic Segmentation using Adversarial Networks”, Pauline Luc, Camille Couprie, Soumith Chintala, Jakob Verbeek, 25 Nov. 2016