https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
5. Proposal-based
Slide Credit: CS231nHariharan et al. Simultaneous Detection and Segmentation. ECCV 2014
External
Segment
proposals
Mask out background
with mean image
Similar to R-CNN, but with segment proposals
5
6. He et al. Mask R-CNN. ICCV 2017
Proposal-based Instance Segmentation: Mask R-CNN
Faster R-CNN for Pixel Level Segmentation as a parallel prediction of masks
and class labels
6
7. He et al. Mask R-CNN. ICCV 2017
Mask R-CNN
● Classification & box detection losses are identical to those in Faster R-CNN
● Addition of a new loss term for mask prediction:
The network outputs a K x m x m volume for mask prediction, where K is the
number of categories and m is the size of the mask (square)
7
9. He et al. Mask R-CNN. ICCV 2017
Mask R-CNN: RoI Align
Reminder: RoI Pool from Fast R-CNN
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv features:
C x H x W
with region proposal
Fully-connected
layers
Max-pool within
each grid cell
RoI conv features:
C x h x w
for region proposal
Fully-connected layers expect
low-res conv features:
C x h x w
x/16 & rounding → misalignment ! + not differentiable
9
10. Mask R-CNN: RoI Align
10
Roi Pooling example
Figure Credit: Deepsense.aiHe et al. Mask R-CNN. ICCV 2017
11. Figure Credit: Silvio Galesso
Mask R-CNN: RoI Align
He et al. Mask R-CNN. ICCV 2017
use bilinear
interpolation
13. Limitations of Proposal-based models
13
1. Two objects might share the same bounding box: Only
one will be kept after NMS step.
2. Choice of NMS threshold is application dependant
3. Same pixel can be assigned to multiple instances
4. Number of predictions is limited by the number of
proposals.
16. Recurrent Semantic Instance Segmentation
Salvador, A., Bellver, Campos. V, M., Baradad, M., Marqués, F., Torres, J., & Giro-i-Nieto, X. (2018) From
Pixels to Object Sequences: Recurrent Semantic Instance Segmentation.
17. Recurrent Semantic Instance Segmentation
Salvador, A., Bellver, Campos. V, M., Baradad, M., Marqués, F., Torres, J., & Giro-i-Nieto, X. (2018) From
Pixels to Object Sequences: Recurrent Semantic Instance Segmentation.
18. Recurrent Semantic Instance Segmentation
Pascal VOC
Cityscapes
CVPPP
Salvador, A., Bellver, Campos. V, M., Baradad, M., Marqués, F., Torres, J., & Giro-i-Nieto, X. (2018) From
Pixels to Object Sequences: Recurrent Semantic Instance Segmentation.
19. Recurrent Semantic Instance Segmentation
Salvador, A., Bellver, Campos. V, M., Baradad, M., Marqués, F., Torres, J., & Giro-i-Nieto, X. (2018) From
Pixels to Object Sequences: Recurrent Semantic Instance Segmentation.
Object discovery patterns
20. Instance Embedding
Basis of Instance Embedding Segmentation
● Each pixel in the output of the network is a point in the embedding space
● Pixel belonging to same object are close in the embedding space, and pixels belonging to different
objects, are distant
● Parsing the image embeddings involves some clustering
embedding dimension K
22. Instance Embedding
22
Mapping pixels to a N-dimensional space where pixels belonging to the same object are close to each other.
Results on Cityscapes
Brabandere et al. Semantic Instance Segmentation with a Discriminative Loss Function. CVPRW 2017