FickleNet is a method for weakly and semi-supervised semantic image segmentation that generates multiple localization maps from a single image using random combinations of hidden units. It aggregates these maps to discover relationships between object locations. This allows it to expand activated regions beyond just discriminative parts. Experiments on PASCAL VOC 2012 show it achieves state-of-the-art performance in both weakly and semi-supervised settings. Key techniques include feature map expansion for efficient inference and center-preserving dropout to relate kernel centers to other locations.
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference
1. FickleNet: Weakly and Semi-supervised Semantic
Image Segmentation using Stochastic Inference
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
SNU, Korea | CVPR 2019
2020.03.22
2. Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents
3. FickleNet
Introduction – Limitation of Prior Works
• Semantic segmentation in real life requires a large variety of object classes and labeled data
• Current weakly supervised segmentation methods show inferior results to fully supervised
segmentation
• Main obstacle to weakly supervised semantic image segmentation is the obtaining pixel-level
information(Locations or Boundaries)
• Most weakly supervised segmentation methods depend on localization maps obtained by a
classification network.
• These localization maps focus only on the small discriminative parts of objects(Hard to locate
boundaries)
Introduction / Related Work / Methods and Experiments / Conclusion
4. FickleNet
Introduction – FickleNet
• Generate variety of localization maps from a single image using random combinations of
hidden units in CNN
• Chooses hidden units at random for each sliding window position
(Similar to Dropout technique)
• Random selection of hidden units (stochastic approach) produces regions of different shapes
• Many existing studies used stochastic regularization in their training process(e.g. Dropout), but
not in inference phase.
Introduction / Related Work / Methods and Experiments / Conclusion
6. FickleNet
Introduction – Contributions
Introduction / Related Work / Methods and Experiments / Conclusion
• FickleNet discovers the relationship between locations in an image
and enlarges the regions activated by the classifier.
• Introduce method of expanding feature maps which makes the
model work faster with only a small cost in GPU memory.
• FickleNet achieved SOTA performance on the PASCAL VOC 2012
benchmark in both weakly and semi supervised settings
7. Related Work
Image Level Processing
Introduction / Related Work / Methods and Experiments / Conclusion
• Class Activation Map (CAM) is a good starting point for the
classification of pixels from image-level annotations
• CAM discovers the contribution of each hidden unit in NN, but
it tends to focus on the small discriminative region of a target.
8. Related Work
Feature Level Processing
Introduction / Related Work / Methods and Experiments / Conclusion
• Multi-dilated convolution(MDC) uses several convolutional blocks,
dilated at different rates, and aggrates CAMs obtained from each block
that resembles ensemble learning
• Dilation rates are limited
• Standard dilated convolution is square with a fixed size, so MDC tends to
identify false positive regions
10. Related Work
Region Growing
Introduction / Related Work / Methods and Experiments / Conclusion
• DSRG(Deep Seed Growing Region)
→ Seeds for region growing are obtained from CAM
→ VGG for classification network
→ DeepLab-ASPP for the segmentation network
→Seeds only come from discriminative parts of objects, difficult to grow into
non-discrimative parts.
11. Methods and Experiments
Stochastic Hidden Unit Selection
Introduction / Related Work / Methods and Experiments / Conclusion
• Randomly select hidden units, to associate a non-discriminative part of an
object with a discriminative part of the same object.
12. Methods and Experiments
Stochastic Hidden Unit Selection - Feature Map Expansion
Introduction / Related Work / Methods and Experiments / Conclusion
• Apply spatial dropout to the feature X at each sliding window position.
• Different from standard dropout technique, which only samples hidden units in the feature
maps once.
• This method of selecting hidden units can generate receptive fields of many different shapes
and sizes
• Calling convolution function and dropout function w x h times in each forwarding pass is very
inefficient
• Therefore, expand feature maps so that no sliding window positions overlap
13. Methods and Experiments
Stochastic Hidden Unit Selection – Center preserving spatial dropout
Introduction / Related Work / Methods and Experiments / Conclusion
• Do not drop the center of the kernel of each sliding window
position
• Relationships between kernel center and other locations in each
stride can be found this way
14. Methods and Experiments
Inference Localization Map
Introduction / Related Work / Methods and Experiments / Conclusion
• Use gradient based CAM(Grad-CAM), which is a generalization of
class activation map(CAM)
• Grad-CAM discovers the class specific contribution of each hidden
unit to the classification score from gradient flow
• From the final output feature map, apply global average
pooling(GAP) and sigmoid function to obtain classification score
15. Methods and Experiments
Inference Localization Map – Aggregate localization map
Introduction / Related Work / Methods and Experiments / Conclusion
• FickleNet constructs N different localization maps from a single image and
aggregate them into a single localization map.
16. Methods and Experiments
Inference Localization Map – Training Process
Introduction / Related Work / Methods and Experiments / Conclusion
• Localization map provides pseudo-label to train a semantic image
segmentation network
• Use same background cues as DSRG
• Using aggregated map as a seed, apply region growing method based on the
probabilities obtained from the segmentation network.
Segmentation
Network
Aggregated
Map
17. Methods and Experiments
FickleNet – Experimental Setup
Introduction / Related Work / Methods and Experiments / Conclusion
• Dataset – PASCAL VOC 2012 image segmentation
(21 object classes / 10,582 training images with image-level annotation)
• Based on VGG-16 network pre-trained using the ImageNet
(modified by removing all fc layers and the last pooling layer)
• Segmentation is performed by DSRG, based on Deeplab-CRF
• Set the number of different localization maps to 200
18. Methods and Experiments
FickleNet – Weakly Supervised Semantic Segmentation
Introduction / Related Work / Methods and Experiments / Conclusion
19. Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet – Weakly Supervised Semantic Segmentation with ResNet
20. Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet – Semi Supervised Semantic Segmentation with ResNet
21. Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
FickleNet – Semi and Weakly Supervised Semantic Segmentation
22. Methods and Experiments
Ablation Study
Introduction / Related Work / Methods and Experiments / Conclusion
1. Effects of the Map Expansion Technique
• Training and CAM extraction times are reduced factors of 15.4
and 14.2, at a cost of 12% in GPU memory use
23. Methods and Experiments
Ablation Study
Introduction / Related Work / Methods and Experiments / Conclusion
2. Iterative Inference and Dropout Rate
• Additional random selection identifies more regions of a target object
• The segmentation performance converge as N increases
• Dropout rate of 0.9 allows FickleNet to cover larger regions of the target object
than DSRG – More randomness, more non-discriminative parts
24. Methods and Experiments
Ablation Study
Introduction / Related Work / Methods and Experiments / Conclusion
3. Comparison to General Dropout
• Hidden unit in FickleNet may be activated at some window positions and dropped
at others so that every hidden unit is able to affect the classification score
25. Conclusion
Introduction / Related Work / Methods and Experiments / Conclusion
• Addressed the problem of semantic image segmentation using only
image-level annotations
• Obtain many different localization maps and aggregate those maps into
a single localization map
• Implemented efficiently by expanding the feature maps
• Results of FickleNet on both weakly supervised and semi supervised
segmentation are better than those produced by other SOTAs