This document presents an overview of using deep learning for object extraction from satellite imagery. It discusses the needed data, training process, evaluation methods, appropriate tools, and literature review on the subject. Code samples applying techniques like VGGNet, Faster R-CNN, YOLO, and fully convolutional networks to datasets like SpaceNet and DSTL achieve preliminary results, with the YOLO model obtaining a maximum F1 score of 0.21 on test data.
2. Presenter
• Aly Osama
• Research Software Development
Engineer at Microsoft
• Contact:
• Email: alyosamah@gmail.com
• https://www.linkedin.com/in/alyosa
ma/
• https://github.com/alyosama
3. Agenda
1. Needed data, its size,
2. How training will go,
3. How evaluation should be carried,
4. What learning tools would you see and why?
5. Literature survey on the subject
6. Code sample/preliminary results
5. 1. Satellite Imagery
• Objects are often very small (~20 pixels in
size ) as example 0.5m/pixel
• Input images are enormous (often
hundreds of megapixels).
• Image has more than 3 channels (RGB)
called bands.
• Image Format:
• Images (GeoTiff , … etc )
• Labels ( GeoJSON , WTK )
• On the positive side,
• The physical and pixel scale of objects
are usually known in advance
• There’s a low variation in observation
angle.
7. 2. Training
• Experiments
1. VGGNet. - (Baseline)
• Tune the pretrained model – Transfer
Learning – based on available data
• Data Augmentation like
• Random Crops / Scales
• Color Jitter
2. Faster RCNN or YOLO
• For Detection and Localization
15. Cheng et al. 2016
• Deep Learning Papers
• (Han et al., 2015;Tang et al.,
2015; Wang et al., 2015; Zhou et
al., 2015b)
• Datasets
• NWPU VHR-10 dataset (Cheng et
al., 2014a)1.
• SZTAKI-INRIA building detection
dataset (Benedek et al., 2012)2.
• TAS aerial car detection dataset
(Heitz and Koller, 2008)3.
• Overhead imagery research
dataset (OIRDS) (Tanner et al.,
2009)
• IITM road extraction dataset
(Das et al., 2011)5.
16. FAST AIRCRAFT DETECTION IN SATELLITE IMAGES BASED ON CONVOLUTIONAL NEURAL NETWORKS
(Wu et. Al.) 2015
18. FULLY CONVOLUTIONAL NETWORKS FOR
BUILDING AND ROAD EXTRACTION:
PRELIMINARY RESULTS 2016 Zilong Zhong
• Dataset
• Massachusetts’ road dataset and building dataset
• each image consists of 3×1500×1500 pixels
• contains 1,711 aerial images,
• the FCN’s computation consumption could be much
higher than that of the ordinary object recognition
models.
19. Building detection in very high resolution
multispectral data with deep learning features
2016
• AlexNet -> Features + SVM ( Last layer )
20. Road network extraction a neural-dynamic
framework based on deep learning and a finite
state machine wang2015
CNN + FSM
( for
sequence )
21. Do Deep Features Generalize from Everyday Objects
to Remote Sensing and Aerial Scenes Domains? 2016
Ot´avio A. B. Penatti
• ConvNet using Caffe and OverFeat
22. Using convolutional networks and satellite imagery to
identify pa.erns in
urban environments at a large scale
ADRIAN ALBERT*, Massachuse.s Institute of
Technology2017
• Dataset :
• UC Merced land use dataset [25] (of 2100
images spanning 21 classes)
• DeepSat land use benchmark dataset ( 4
channels
• (VGGNet and ResNet)
24. YOLT2
• The actual F1 score of 0.21
• Jaccard index between 0.4 and 0.5
https://medium.com/the-downlinq/building-extraction-with-yolt2-and-spacenet-data-a926f9ffac4fT
25. CosmiQNet
• Blackbox Fully Convolution
Neural Network: CosmiQNet.
The inputs are at two
resolutions and the output
distance transform matches the
lower of the input resolutions.
The resolution of the 8-band
GeoTIFF is roughly one quarter
(in each dimension) the
resolution of the the 3-band
GeoTIFF; the difference in
resolution is depicted by the
scale of the GeoTIFFs.
https://medium.com/the-downlinq/object-detection-on-
spacenet-5e691961d257