08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Building and road detection from large aerial imagery
1. Building and road detection
from large aerial imagery
Shunta SAITO*, Yoshimitsu AOKI*
* Graduate School of Science and Technology,
Keio University, Japan
2. Motivation
• Understanding aerial image is highly demanded for generating maps,
analyzing disaster scale, detecting changes to manage estate, etc.
• But it’s been usually done by human experts, so that it’s both slow and costly.
• Remote sensing community has been focused on this task but it’s still difficult
to detect terrestrial objects automatically from aerial image with high accuracy.
Goal
Input aerial image
(RGB)
Output 3-channels map
R: road, G: building, B: others
The trade-off between different
objects at a same pixel
3. Previous Works
Senaras et al., Building detection with decision fusion, 2013
ResultProcess flow
Input
Predicted labels
Ground truth
Mean shift
segmentation
Extract 15
different features
Combine
the multiple
classifica-
tion results
segments
feature
feature
feature
Vegetation mask
Shadow mask
Infrared-Red
image
Input aerial image (Infrared, RGB)
Infrared-Red-
Green image
Hue-Saturation-
Intensity (HSI)
image
Normalized
Difference
Vegetation Index
(NDVI) image
classifier
classifier
classifier
4. Previous Works
Volodymyr Mnih, Machine Learning for Aerial Image Labeling, 2013
• They take patch-based approach
which is very suited to use
Convolutional Neural Network (CNN).
• They formulate the problem as
obtaining a mapping from aerial
image patch to label image patch.
• However, they train two CNNs, one
for building, another for road, despite
there may be trade-offs between
them.
Process flow Result Description
Aerial imagery
Predicted Label
Noise model
Patches
CNNCNNCNN
Dataset
Aerial image Building label
x 151
Road label
x 1109
5. Convolutional Neural Network
64 x 64 x 3 (RGB) sized patches
Correct answers
Predictions
16 x 16 x 3 (building, road, other)
Calculate loss
Backpropagation
Our Approach
We train a Convolutional Neural Network (CNN) as a mapping from an input aerial
image patch and a 3-channel label image patch using stochastic gradient descent.
R
G
B
Input aerial
image patch
Predicted map patch
FC(4096)C(64, 9x9/2) P(2/1) C(128, 7x7/1) C(128, 5x5/1) FC(768)
• We train a CNN which has the above architecture.
• The CNN takes a small RGB image patch as input, and
output a predicted 3-channel label patch.
• The predicted label patch is consisted of Road channel
and Building channel and others channel.
• No pre-processing like segmentation is needed and we
don’t need to design any image features. CNN obtains
good feature extractors automatically.
6. Patch-based framework
allow the network to use
surrounding pixels to predict labels in the
center patch
Using context
It’s building!
?
1
0
0
0
0
1
˜mi2
˜mi1
˜mi3
i
s
˜m
wm
wm
ws
ws
Road
Building
Otherwise
Aerial image
patch
Predicted label patch
p( ˜m|s) =
w2
m
i=1
p( ˜mi|s)We learn with CNN.
7. Loss function
• Each pixel in predicted label image
• is independent each other (assumption)
• is always belonging to only one of the 3 labels (building, road, others)
ˆmi
(1.56, 4.37, 3.11)
softmax
(0.05, 0.74, 0.21)
Predicted label
(1, 0, 0)
˜mi
Correct label
c : channel (Building, Road, Others)
P : correct label distribution (1-of-3 coding)
Q : predicted label distribution
Asymmetric cross entropy
and just minimize this cross entropy by
Stochastic Gradient Descent
wc : weight for each channel loss
wbuilding = 1.5, wroad = 1.5, wothers = 0
*because prediction loss in the others channel is not important
8. Dataset
+ →
Building label Road label 3-channel labelAerial image
• We combine the Volodymyr’s Road and
Building detection datasets* to create our 3-
channel map dataset.
• Our dataset contains 147 sets of aerial
images and 3-channel label images.
- 137 sets for training
- 10 sets for testing
• Each image is 1500 x 1500 pixel sized at
1m^2/pixel resolution.
• The entire dataset covers roughly 340 km^2
of mainly urban region in Massachusetts, the
United States.
Aerial images 3-ch labels
Dataset
* http://www.cs.toronto.edu/~vmnih/data/
9. Experiment
• Training with 137 images
and labels
• Testing with 10 images
and labels
• We test some variants of
the basic architecture.
R
G
B
Input aerial
image patch
Predicted map patch
FC(4096)C(64, 9x9/2) P(2/1) C(128, 7x7/1) C(128, 5x5/1) FC(768)
Basic architecture
Activation Dropout rate Filter size
S-ReLU(Basic) ReLU N/A 9-7-5
S-ReLU-Dropout ReLU 0.5 9-7-5
S-Maxout Maxout 0.5 9-7-5
ReLU ReLU N/A 16-4-3
ReLU-Dropout ReLU 0.5 16-4-3
Maxout Maxout 0.5 16-4-3
Tested architectures
10. Input aerial image Predicted 3-channel label image
from the basic architecture
Example of Test Results
14. Road Building
S-ReLU(Basic) 0.8905 0.9241
S-ReLU-Dropout 0.8889 0.9220
S-Maxout 0.8842 0.9185
ReLU 0.8657 0.8984
ReLU-Dropout 0.8650 0.8973
Maxout 0.8548 0.8940
Volodymyr 0.8873 0.9150
Precision at breakeven point
• Our basic architecture achieved the
best results.
• Using Maxout or Dropout, or both
seem not to improve the performance.
• The architecture which has smaller
filter size is better than ones with
bigger filters.
Conclusion
• We propose a CNN-based building and road extraction method for aerial
imagery.
• Our method doesn’t need hand-designed image features because the good
feature extractors are automatically constructed by training CNN.
• Our CNN predicts building and road regions simultaneously at state-of-the-art
accuracy.
15. Thank you for your kind attention.
All codes to generate our
dataset, perform training
of CNN, and test of the
resulting models are
available on GitHub.
https://github.com/mitmul/ssai