Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
DeconvNet, DecoupledNet,
TransferNet in Image Segmentation
NamHyuk Ahn @ Ajou Univ.
2016. 05. 11
Contents
- Semantic Segmentation
- Deconvolution Network for Supervised Learning
- Decoupled Network for Semi-Supervised L...
Semantic Segmentation
Semantic Segmentation
- Predict pixel-level label in image
- ct
[Shotton et al . 2007]
PASCAL VOC
- 20 classes
- 12K training / 1K test images

MS COCO
- 91 classes
- 120K training / 40K test

images
Datasets
Deconvolution Network for
Supervised Learning
Problems of FCN
- FCN only handle
single-scale semantic,
since it has fixed-size
receptive field
- Label map is so small,
te...
DeconvNet
- To address such issue, they use “deconvolution”
- Convolution Network extract features (VGG-16 net)
- Deconvol...
Deconvolution Network
- Unpooling
• Reconstruct structure of
original activation map
• Activation size is preserved,
but s...
Analysis of DeconvNet
- DeconvNet is better in segmentation since it produce
dense and enlarged pixel-wise map
- Shallow l...
Analysis of DeconvNet
More details of DeconvNet
- Instance-wise segmentation
- Use batch normalization in both networks
- Two-stage training
- E...
Instance-wise Segmentation
- Input proposal instances in network (not entire image)
- Get proposal instance using EdgeBox ...
Two-stage Training
- DeconvNet has lots of parameters, but don’t have
many segmentation data (10K in PASCAL VOC)
• Use two...
Result
- 2nd best in Pascal VOC only training
- Note: In paper they say mean IOU is 72.5, but in
presentation files, 74.8
Qualitative Example
Recap
- Possible to make dense, precise segmentation mask
since reconstruct coarse-to-fine construction
- With instance-wis...
Decoupled Network for Semi-
Supervised Learning
Motivation
- Make ground-truth of segmentation takes a lot of
cost so do it like semi-supervised learning
- Utilize many i...
Main idea
- Semantic segmentation can be decomposed to 

multi-label classification, binary segmentation
Person
Bottle
Mult...
Overview
- Classification network for multi-label classification
- Segmentation network for binary segmentation
- Bridging l...
Architecture
- Classification Network (Same as VGG-16)
- Segmentation Network
• Take class-specific activation map from brid...
Architecture
- Bridging Layers
• Segmentation network needs class-specific and spatial info to
produce class-specific segmen...
Architecture
- Saliency Map
1. Produce score vector, set
dscore all 0 but 1 in idx
related to label that want
to track
2. ...
Architecture
- Bridging Layers
• Combine , to produce class-specific activation map
• Pass through fc layer and feed to seg...
Inference
- Computing segmentation map for each identified label
- Pixel-wise aggregate each segmentation map M
Training
- Train classification network with many image-level
annotation
- Train segmentation network and bridging layers w...
Result
Qualitative Example
Recap
- Utilize many image-level annotation and few pixel-level
annotation
- Add bridging layer to DeconvNet for binary se...
Transfer Learning in Semantic
Segmentation
Motivation
- Pre-train network and inference to new dataset

(ex. train with MS COCO, inference to PASCAL VOC)
- This idea...
Overview
- Attention model identify salient region of each class associated with input
image
• Output of attention model h...
Overview
- Decoupled encoder-decoder make it possible to share information
for shape generation among different class
- At...
Architecture
- Encoder
• Extract feature descriptor as 

A is obtain from last conv layer to retain spatial information
• ...
Architecture
- Attention model
• To apply attention to this model, it has to be trainable in both
domain
• Add additional ...
Architecture
- Decoder
• Output of attention model is spare due to softmax, it may lost
information for shape generation
•...
Analysis of TransferNet
- Decoder generates foreground segmentation of
attention to each label
- By decoupling classificati...
Train / Inference
- When train, optimize this eq
• Trained using only class label is good, but jointly train with
segmenta...
Result
Qualitative Example
Reference
- Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. “Learning
deconvolution network for semantic segmentation.” Pro...
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
Nächste SlideShare
Wird geladen in …5
×

DeconvNet, DecoupledNet, TransferNet in Image Segmentation

Presentation slides used in lab seminar.

  • Als Erste(r) kommentieren

DeconvNet, DecoupledNet, TransferNet in Image Segmentation

  1. 1. DeconvNet, DecoupledNet, TransferNet in Image Segmentation NamHyuk Ahn @ Ajou Univ. 2016. 05. 11
  2. 2. Contents - Semantic Segmentation - Deconvolution Network for Supervised Learning - Decoupled Network for Semi-Supervised Learning - Transfer Learning in Semantic Segmentation
  3. 3. Semantic Segmentation
  4. 4. Semantic Segmentation - Predict pixel-level label in image - ct [Shotton et al . 2007]
  5. 5. PASCAL VOC - 20 classes - 12K training / 1K test images
 MS COCO - 91 classes - 120K training / 40K test
 images Datasets
  6. 6. Deconvolution Network for Supervised Learning
  7. 7. Problems of FCN - FCN only handle single-scale semantic, since it has fixed-size receptive field - Label map is so small, tend to forget detail structures of object
  8. 8. DeconvNet - To address such issue, they use “deconvolution” - Convolution Network extract features (VGG-16 net) - Deconvolution Network generate probability map (same size to input image) - Probability map indicate probability each pixel belongs to one of class -
  9. 9. Deconvolution Network - Unpooling • Reconstruct structure of original activation map • Activation size is preserved, but still sparse - Deconvolution • Densify sparse (enlarge) activation map
  10. 10. Analysis of DeconvNet - DeconvNet is better in segmentation since it produce dense and enlarged pixel-wise map - Shallow layers tend to capture overall structure of object (shape, region, position), deep layers does complicated patterns - Unpooling captures example-specific structure so can reconstruct object details in higher resolution - Deconvolution captures class-specific shape, so closely related to target class are amplified and noise activations are suppresed
  11. 11. Analysis of DeconvNet
  12. 12. More details of DeconvNet - Instance-wise segmentation - Use batch normalization in both networks - Two-stage training - Ensemble with FCN • FCN, DeconvNet are complementary relationship • Best result
  13. 13. Instance-wise Segmentation - Input proposal instances in network (not entire image) - Get proposal instance using EdgeBox algorithm - Identify more details of object with multi scale - Reduce search space, so can reduce memory at train
  14. 14. Two-stage Training - DeconvNet has lots of parameters, but don’t have many segmentation data (10K in PASCAL VOC) • Use two-stage training to address this issue • Fist stage: Input center-cropped images • Second stage: Input proposal sub-images - So network generalize better
  15. 15. Result - 2nd best in Pascal VOC only training - Note: In paper they say mean IOU is 72.5, but in presentation files, 74.8
  16. 16. Qualitative Example
  17. 17. Recap - Possible to make dense, precise segmentation mask since reconstruct coarse-to-fine construction - With instance-wise segmentation, it can handle object scale variation - But lots of parameters (almost 2x VGG-16) 
 so additional training stage is needed
  18. 18. Decoupled Network for Semi- Supervised Learning
  19. 19. Motivation - Make ground-truth of segmentation takes a lot of cost so do it like semi-supervised learning - Utilize many image-level annotation and few pixel- level annotation - Modify DeconvNet - With less data (25 per class), achieve good result (62.5 mean IOU)
  20. 20. Main idea - Semantic segmentation can be decomposed to 
 multi-label classification, binary segmentation Person Bottle Multi-label classification Binary segmentationSemantic segmentation
  21. 21. Overview - Classification network for multi-label classification - Segmentation network for binary segmentation - Bridging layers for delivering class-specific information to segmentation network
  22. 22. Architecture - Classification Network (Same as VGG-16) - Segmentation Network • Take class-specific activation map from bridge layer and do binary segmentation (main difference with DeconvNet) • Binary segmentation reduce parameters, so we can train with few pixel-wise annotation data
  23. 23. Architecture - Bridging Layers • Segmentation network needs class-specific and spatial info to produce class-specific segmentation mask • Get spatial information from pool5 in classification network • has useful info for shape generation, but contain mixed info of all relevant label → identify class-specific activation • Make saliency map to identify class-specific activation
  24. 24. Architecture - Saliency Map 1. Produce score vector, set dscore all 0 but 1 in idx related to label that want to track 2. Backprop to arbitrary layer (pool5 in this paper) - By saliency map we can get class-specific information 
 in each label (class) Qualitative example of saliency map 
 [Karen Simonyan et al,. 2014]
  25. 25. Architecture - Bridging Layers • Combine , to produce class-specific activation map • Pass through fc layer and feed to segmentation network • g has both spatial and class-specific information
  26. 26. Inference - Computing segmentation map for each identified label - Pixel-wise aggregate each segmentation map M
  27. 27. Training - Train classification network with many image-level annotation - Train segmentation network and bridging layers with few pixel-level annotation
  28. 28. Result
  29. 29. Qualitative Example
  30. 30. Recap - Utilize many image-level annotation and few pixel-level annotation - Add bridging layer to DeconvNet for binary segmentation to reduce parameter - Bridging layer output both spatial and class-specific information in each class (label) - Train two networks separately (decoupled) • Worse performance in fully-supervision since jointly optimization is more desirable in fully-supervision - With few strong annotated data (25 per class) achieve good result (62.5 mean IOU)
  31. 31. Transfer Learning in Semantic Segmentation
  32. 32. Motivation - Pre-train network and inference to new dataset
 (ex. train with MS COCO, inference to PASCAL VOC) - This idea doesn’t work well with DecoupledNet • DecoupledNet trained with class-specific input, so it can’t be generalize to new class • Train network with class-independent input!
  33. 33. Overview - Attention model identify salient region of each class associated with input image • Output of attention model has location information of each class in coarse feature map - Encoder extract features; Decoder generate dense foreground segmentation mask of each focused region - Training stage • Fix encoder (pre-trained) and train decoder, attention model using pixel-level annotation from source domain • Train attention model using image-level annotation in both domain - After training, decoder is trained with source domain and attention is trained with both domain so attention adapted to target domain
  34. 34. Overview - Decoupled encoder-decoder make it possible to share information for shape generation among different class - Attention model provides • Predictions for localization • Class-specific information → enable to adapt decoder into target domain - With attention model, able to get information transferable across different domain and provide useful segmentation prior information
  35. 35. Architecture - Encoder • Extract feature descriptor as 
 A is obtain from last conv layer to retain spatial information • M, D is # of hidden unit (20x20), # of channel respectively - Attention model • To train weight vector , where represents relevance of location to each class l • Formally, • And extra technique to reduce parameter [R. Memisevic. 2013] did
  36. 36. Architecture - Attention model • To apply attention to this model, it has to be trainable in both domain • Add additional layers on top of attention model, and train
 both , under classification objective • Finally, , z represents class-specific feature • Can optimize z using weak annotation with both domain
 • Example of attention
  37. 37. Architecture - Decoder • Output of attention model is spare due to softmax, it may lost information for shape generation • Feed additional input A to z (multiply) → densified attention • With densified attention, optimize segmentation loss, procedure is same as DecoupledNet, but optimize decoder only with source domain
  38. 38. Analysis of TransferNet - Decoder generates foreground segmentation of attention to each label - By decoupling classification (domain specific task), it can capture class-independent information for shape generation and apply unseen class - Train attention model using not only pixel-level but also image-level annotation, it can handle unseen class • In DecoupledNet, bridging layer is trained by only pixel-level data

  39. 39. Train / Inference - When train, optimize this eq • Trained using only class label is good, but jointly train with segmentation label to regularize noise • After training, remove since it is required only in training to learn attention from target domain - Inference 1. Iteratively obtain attention and segmentation mask 2. Aggregate mask (same as DecoupledNet)
  40. 40. Result
  41. 41. Qualitative Example
  42. 42. Reference - Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. “Learning deconvolution network for semantic segmentation.” Proceedings of the IEEE International Conference on Computer Vision. 2015. - Seunghoon Hong, Hyeonwoo Noh, and Bohyung Han. "Decoupled deep neural network for semi-supervised semantic segmentation.” Advances in Neural Information Processing Systems. 2015. - Seunghoon Hong, et al. “Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network.” arXiv preprint arXiv:1512.07928 (2015). - Hyeonwoo Noh. “Semantic Segmentation and Visual Question Answering” (https://drive.google.com/file/d/0B5xl2L77gZfVRXZxQWNmSGlBemc/view)

×