This is an Image Semantic Segmentation project targeted on Satellite Imagery. The goal was to detect the pixel-wise segmentation map for various objects in Satellite Imagery including buildings, water bodies, roads etc. The data for this was taken from the Kaggle competition <https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection>.
We implemented FCN, U-Net and Segnet Deep learning architectures for this task.
2. Kaggle: Can you train an eye in the sky?
Challenge: The Defence Science and
Technology Laboratory (DSTL) is seeking
novel solutions to alleviate the burden on
their image analysts and challenges
kagglers to accurately identify and classify
objects in overhead satellite imagery.
Introduction Data Methods Results
3. What’s in a picture?
Introduction Data Methods Results
4. How is this useful?
Medical imaging Agriculture Surveillance
Introduction Data Methods Results
5. Data
Input: 25 1km x 1km satellite images in both 3-band and 16-band
formats
● Format: GeoTiff
● Images are taken from the same region but coordinates are
transformed so the location is obscured
Object class: every class is provided in the form of a Multipolygon
● Format: Geojson or WKT
Introduction Data Methods Results
6. Object Class Types
Buildings Crops
Misc. Manmade Structures Waterway
Roads Standing Water
Track Vehicle Large
Trees Vehicle Small
Introduction Data Methods Results
7. Data Processing of Labels
Introduction Data Methods Results
Match [0,1] coordinates to
pixel coordinates
Compute projection factors for
multipolygon
8. Data Processing of Labels
Introduction Data Methods Results
Multipolygons to shapely objects
Project geometry to pixel coordinates
Shapely objects to shapefiles
to tiff files
12. Average Number of Polygons Distribution
Introduction Data Methods Results
13. More Data Processing
25 512x512
images
Introduction Data Methods Results
25 ~3300x3300
images
25 3072x3072
images
900 512x512
images
DIRECT SCALING PARTITION
14. Methods - Semantic Segmentation with Deep Learning
Important deep learning
models for semantic
segmentation:
● Fully Convolutional
Network [Nov 2014]
● U-net [May 2015]
● Segnet [Nov 2015]
Introduction Data Methods Results
15. Methods - Semantic Segmentation with Deep Learning
VGG-16:
Introduction Data Methods Results
16. Methods - Semantic Segmentation with Deep Learning
Introduction Data Methods Results
Fully Convolutional
Network:
● No fully
connected
● Skip
connection
● VGG-16
17. Methods - Semantic Segmentation with Deep Learning
U-Net:
Introduction Data Methods Results
18. Methods - Semantic Segmentation with Deep Learning
U-Net:
● Encoder-Decoder network.
● Every decoding phase is convolved with trainable filters.
● Copy the encoder embedding to the corresponding decoder.
● Data Augmentation [Stretching and rotation].
● Weighted Cross Entropy.
● Forces network to learn the border pixels.
Introduction Data Methods Results
19. Methods - Encode/Contracting path
Goal:
● Retain context and
localization accuracy.
Operations:
● Convolution
● Non Linearity (ReLU)
● Pooling
● But skip the fully connected
layers
Introduction Data Methods Results
3x3 Convolution with
no padding, stride of 2
20. Methods - Semantic Segmentation with Deep Learning
Segnet Architecture:
Introduction Data Methods Results
21. Methods - Decode/Expansive path
Goal:
● To recover the object details and
spatial dimension
Operation:
● “Up-convolution”/ “upsampling”
● Concatenate with the corresponding
cropped encoder feature maps
● Convolution layers
● ReLU
Introduction Data Methods Results
22. Methods - Semantic Segmentation with Deep Learning
Segnet:
● Encoding part is exactly VGG-16
● Use Trained weights from VGG-16 [Excluding the last fully connected
layer]
● Decoder uses the pooling indices from max pooling step of
corresponding encoder.
● The upsampled maps were convolved with trainable filters.
● Unlike U-Net they don’t copy the entire encoding.
● Reduced the trainable parameters from 134M → 14.7M
Introduction Data Methods Results
23. Methods - Semantic Segmentation with Deep Learning
Segnet Unpooling:
Introduction Data Methods Results
24. Methods - Semantic Segmentation with Deep Learning
FCN vs Segnet:
Introduction Data Methods Results
26. Methods: How does upsampling work?
Transposed convolution (fractionally strided
convolution/deconvolution)
● Reconstructs the spatial resolution
● The weights are learnable
● It is NOT reverse convolution process
Introduction Data Methods Results
Transposed 2x2 convolution
with no padding, stride of 2 and
kernel of 3
29. Transposed convolution as
matrix multiplication
(16 x 4) (4 x 1) = (16 x 1)
● Dimension of input and output swap
● Uses transpose of convolution matrix
30. Preliminary results: partitioned images [900x512x512]
Introduction Data Methods Results
Epoch Loss Acc Epoch Loss Acc
1 0.2356 0.9587 6 NA NA
2 0.1763 0.9587 7 NA NA
3 ETA: ~1 day 8 NA NA
4 NA NA 9 NA NA
5 NA NA 10 NA NA
32. Actual Next Steps:
▫ Include more classes as part of our training.
▫ Tuning the hyperparameters of the model.
▫ Making the segnet work.
Future Works:
▫ Exploring more recently published models. Eg: Deeplab
v3[2018]
▫ Use higher computing resources to run the models
faster.
33. References:
▫ Ronneberger, O. (2017). Invited Talk: U-Net Convolutional Networks for Biomedical Image Segmentation.
Informatik Aktuell Bildverarbeitung Für Die Medizin 2017, 3-3. doi:10.1007/978-3-662-54345-0_3
▫ Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2015.7298965
▫ Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder
Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12),
2481-2495. doi:10.1109/tpami.2016.264461
▫ https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
▫ https://www.cs.toronto.edu/~frossard/post/vgg16/
▫ https://medium.com/@wilburdes/semantic-segmentation-using-fully-convolutional-neural-networks-
86e45336f99b
▫ https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection
36. Methods: dilated/atrous convolutions
Goal:
● Take away need to pool layers
Operations:
● Apply predefined gaps between each pixel
of input image
● Replace pooling layer from pretrained
classification system with dilated
convolution
e.g. 2-dilated convolution
Introduction Data Methods Results
37. Kaggle: Evaluation
Average Jaccard Index between the predicted multipolygons and actual
multipolygons. The Jaccard Index for two regions is the ratio of the area of the
intersection to the area of the union.
Jaccard =TP/(TP + FP + FN) = |A∩B|/|A∪B| = |A∩B|/(|A|+|B|−|A∩B|)
Introduction Data Methods Results
Hinweis der Redaktion
In December 2016, Kaggle hosted a 3-month competition in which the UK’s...
But why try to do this?
Medical imaging: detect location of a tumor
Improve precision agriculture, identify plant disease
General surveillance purposes
For this specific challenge, we were provided with….
Multipolygon is a collection of polygons and these polygons represent objects in an image
There are 10 types of object classes kagglers were challenged to identify...
We also wanted to show you a video of the what the different object masks look like when superimposed to the original image...
We also did a quick analysis of our object class distribution...
I also mentioned that our object masks are provided in the form of multipolygons…
A multipolygon of trees is made of a lot of polygon trees, and to a lesser extent...
Ben
Why did we have to scale down to 3072x3072? (multiple of 512)
Convolved Feature(feature map), number of the features we want to extract(depth, number of filters ), stride, zero-padding
-Deconvolution layers allow the model to use every point in the small image to “paint” a square in the larger one.
-”Upsampling: use a 2*2 convolution to halve the number of feature maps→ this is one important modification in U-net: we have a large number of feature channels and allow the network to propagate context information to higher resolution layers. (This is the reason we can have a higher resolution of the output )
-White boxes represent copied feature maps from contracting path. The reason of doing this? To localize and the following layers can learn to assemble a more precise output based on these information.