This document summarizes Diego Cheda's thesis on using monocular depth cues in computer vision applications. The thesis outlines methods for coarse depth map estimation, egomotion estimation, background estimation, and pedestrian candidate generation using monocular cues. For coarse depth map estimation, the author presents a supervised learning approach to segment images into near, medium, far, and very far depth categories using low-level visual features. Experimental results show the approach outperforms other methods using fewer features. The thesis also describes algorithms for egomotion estimation based on tracking distant regions and comparing results to other state-of-the-art methods, showing the distant region approach provides accurate rotation estimates.
1. Monocular Depth Cues in
Computer Vision Applications
Diego Cheda
Thesis Advisors:
Dr. Daniel Ponsa
Dr. Antonio L´opez
December 14, 2012
2. We don’t need two eyes to perceive depth.
[Edgar Muller]
3. Motivation
Human depth cues
There are different sources of information supporting depth
perception.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 3/64
4. Motivation
Depth estimation from a single image
Prior information
Our world is structured In an abstract world
Gloconde Blank check
The listening room Personal values
Ren´e Magritte
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 4/64
7. Objectives
• Coarse depth map estimation
simple and low-cost
low-level features based on pictorial cues
• Increasing the performance of many applications
Egomotion estimation
Background estimation
Pedestrian candidates generation
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 7/64
8. Objectives
Segmenting an image into depth categories
• Near
Depth is usually estimated by using a stereo configuration.
• Very-far
The effect of camera translation at faraway distances is
inappreciable.
• Medium and Far
Interesting for potential applications.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 8/64
10. Coarse depth map estimation
Method
Pipeline of our approach
• Multiclass classification problem
• Supervised learning approach
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 10/64
11. Coarse depth map estimation
Method
Ground truth dataset
• Set of urban outdoor images
Saxena et al.: 400 images for training and 134 for testing.
• Each image has an associated depth map acquired by a laser
scanner.
Thresholding depth map to be used as ground truth.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 11/64
12. Coarse depth map estimation
Method
Regions
Superpixels Regular grid
Superpixels conserve
intra-region similarities.
× Time consuming.
× Regular grids merge
information of different regions.
Once for a camera
configuration.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 12/64
13. Coarse depth map estimation
Method
Features
• Monocular pictorial cues are predominant beyond 30 m to estimate
depth.
• Low-level visual features to represent texture, relative height,
atmospheric scattering.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 13/64
14. Coarse depth map estimation
Method
Features - Texture
Paris street, rainy day - Gustave Caillebotte
At a greater distance, texture
patterns get finer and appear
smoother
To capture textures we use
• Weibull distribution
• Gabor filters
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 14/64
15. Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
16. Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
17. Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
18. Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
19. Coarse depth map estimation
Method
Features - Texture: Gabor filter
Images
Gabor filter responses
• Capture smoothed and
textured regions
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 16/64
20. Coarse depth map estimation
Method
Features - Relative height
When an object is near the
horizon, it is perceived as distant.
To capture relative height we use
• Location: x and y coordinates
in the image
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 17/64
21. Coarse depth map estimation
Method
Features - Location
near medium far
Depth average over ground truth
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 18/64
22. Coarse depth map estimation
Method
Features - Atmospheric scattering
The Virgin and Child with St. Anne - Leonardo Da Vinci
The further away objects are
unclearer and less detailed with
respect to those which are closer.
To capture atmospheric
scattering we use
• RGB histogram
• HSV histogram
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 19/64
23. Coarse depth map estimation
Method
Learning approach
One-vs-All
• Binary classifiers
• Training one classifier per class (near, medium, far, and very-far)
• Low-performance due to number of positive examples for medium
and far regions.
Our approach
• Training three classifiers: > 30, > 50, > 70 m.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 20/64
24. Coarse depth map estimation
Method
Training
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 21/64
25. Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 22/64
26. Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
27. Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
28. Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
29. Coarse depth map estimation
Method
Inference
• CRF
• Combining probabilities obtained from classifiers
• Associating neighboring regions belonging to the same depth
category.
• Graph cut to guarantee a global maximum likelihood result.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 24/64
30. Coarse depth map estimation
Experimental results
Performance measurement
• Measure of performance: Jaccard index.
TP
(TP + FP + FN)
Measures the level of agreement with respect to an ideal
classification result.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 25/64
31. Coarse depth map estimation
Experimental results
Different regions grouping
Performance our method using different oversegmentation configurations.
Regular
Grid
10 x 10 15 x 15 20 x 20
Turbo
Pixels
∼200 regions ∼400 regions ∼800 regions
Algorithm
Number of regions
20x20 15x15 10x10
Superpixels 0.3623 0.3567 0.3561
Grid 0.3586 0.3602 0.3570
• Best performing
configuration is using
superpixels
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 26/64
32. Coarse depth map estimation
Experimental results
Comparison w.r.t. state-of-the-art
Saxena et al.
• A more challenging goal: photo-realistic 3D model
• For each superpixel and its neighbors: features for occlusions,
geometric, statistical and spatial information, textures, at multiple
spatial scales.
• Inferences methods with a high computational.
• MRF
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 27/64
33. Coarse depth map estimation
Experimental results
Comparison w.r.t. state-of-the-art
Using a remarkable inferior number of low-level features (64
vs 646 respectively).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 28/64
34. Coarse depth map estimation
Experimental results
Relevance of visual features
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 29/64
35. Coarse depth map estimation
Experimental results
Image Laser Depth Map Saxena et al. Our
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 30/64
36. Coarse depth map estimation
Experimental results
Image Laser Depth Map Saxena et al. Our
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 30/64
37. Coarse depth map estimation
Conclusions
We have presented
• A supervised learning approach to segment an image
according to certain depth categories.
• Our algorithm use a reduced number of low-level visual
features, which are based on monocular pictorial cues.
Our results show
• Monocular cues are useful for depth estimation.
• Close and distant regions are well-segmented by our approach.
• Regions at medium distances are more difficult to segment.
• In average, our method outperforms Saxena et al. method.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 31/64
39. Egomotion estimation
Motivation
Egomotion estimation
Estimating the vehicle position is a key component in many ADAS
systems
Autonomous navigation
Adaptive cruise control
Lane change assistance
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 33/64
40. Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
41. Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
42. Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
43. Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
44. Egomotion estimation
Goal
Distant regions behave as a plane at infinity
Properties
• It remains in the same image coordinates during translation
• It is only affected by camera rotation
Goal
• Identify distant regions in the image to estimate vehicle rotation
uncoupledly from vehicle translation.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 35/64
45. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
46. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
47. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
48. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
49. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
50. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
51. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
52. Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
53. Egomotion estimation
Experimental results
Datasets
• Karlsruhe dataset: 8 sequences
• More than 8000 (∼ 3 km).
• GT: INS Sensor.
• Stereo depth maps
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 37/64
55. Egomotion estimation
Experimental results
Comparison with other approaches
• The five-point algorithm (5pts) by Nister.
• The Burschka et al. method (RANSAC).
• The stereo-based algorithm by Kitt et al. (as a baseline).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 39/64
59. Egomotion estimation
Conclusions
In this section, we have
• Proposed two novel monocular egomotion methods based on
tracking distant points and distant regions.
Our results show
• Rotations are accurately estimated, since distant regions
provide strong indicators of camera rotation.
• In comparison with other state-of-the-art methods, our
approach outperforms them.
• Comparable performance with respect to the considered stereo
algorithm.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 43/64
61. Background estimation
Problem definition
Background estimation
Automatically remove transient and moving objects from a set of
images with the aim of obtaining an occlusion-free background
image of the scene.
Background model
• Represents objects whose distance to the camera is maximal.
• Background objects are stationary.
Goal
• Identify close regions to penalize deviations from our background
model.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 45/64
63. Background estimation
Method
Energy function
E(f ) = p∈P Dp(fp)
Data term
+ p,q∈N Vp,q(fp, fq)
Smoothness term
Data term Penalizes deviation from our background model taking
into account color, motion and depth.
Dp(fp) = αDS
p (fp) + βDM
p (fp) + γDP
p
• Color variations between sort time intervals
• Moving objects by using motion boundaries
• Close objects using our approach
Smoothness term Penalizes the intensity differences between
neighboring regions, giving a higher cost when images do not
match well.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 47/64
67. Background estimation
Conclusions
In this section,
• We have presented a method to background estimation
containing moving/transient objects.
• This method uses depth information for such purpose by
penalizing close regions in a cost function.
Our results show that
• Our method significantly outperforms the median filter.
• Our approach is comparable to Agarwala et al. method,
without performing any user intervention.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 51/64
69. Pedestrian candidate generation
Problem definition
Pedestrian candidate generation Generating hypothesis to be
evaluated by a pedestrian classifier.
[Ger´onimo 2010]
Goal
Exploiting geometric and depth information available on single images
to reduce the number of windows to be further processed.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 53/64
70. Pedestrian candidate generation
Problem definition
Pedestrian candidate generation Generating hypothesis to be
evaluated by a pedestrian classifier.
[Ger´onimo 2010]
Goal
Exploiting geometric and depth information available on single images
to reduce the number of windows to be further processed.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 53/64
71. Pedestrian candidate generation
Method
Overview
a) Original Image
d) Pedestrian Candidate Windows
b) Geometric Information
c) Depth Information
Fusion
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 54/64
72. Pedestrian candidate generation
Method
Agglomerative clustering schema
• Regions over ground surface
• Agglomerating regions maintaining size coherence w.r.t. depth
Original
Image
Geometric and Depth
Information
Superpixels
(a) Geometric, Depth, and Spatial Information
(b) Superpixels are merged
Gravity
Depth
Size
Hierarchical clustering
(c) Bounding boxes surrounding regions
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 55/64
73. Pedestrian candidate generation
Experimental results
Dataset
• CVC Pedestrian dataset.
• 15 sequences taken from a stereo-rig rigidly mounted in a car
while it is driving on an urban scenario (4364 frames).
• 7983 manual annotated pedestrians visible at less than 50
meters.
Performance measures
• Number of pedestrian candidates generated.
• True Positive Rate TPR =
TP
TP + FN
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 56/64
75. Pedestrian candidate generation
Experimental results
Lost pedestrians
4 %
1 8 %
7 8 %
0 - 1 0 1 0 - 2 5 > 2 5
0
3 0 0
6 0 0
9 0 0
1 2 0 0
LostPedestrians
D i s t a n c e ( m )
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 58/64
76. Pedestrian candidate generation
Conclusions
In this section, we have presented:
• Novel monocular method for generating pedestrian candidates.
• It is based on geometric relationships and depth.
Our results show that:
• Our method overcome all considered methods because
significantly reduces the number of candidates.
• High value for TPR.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 59/64
78. Conclusions and future work
Conclusions
• We have proposed a supervised learning approach to classify the
pixels of outdoor images in just four categories: near,
medium-distance, far and very-far, based on monocular pictorial
cues.
• In comparison against the results of a most complex depth map
estimation method, our method overcomes the performance of it,
using low computational demanding techniques.
• We have demonstrated the usefulness of our coarse depth maps in
improving the results of egomotion estimation, background
estimation, and pedestrian candidates generation. In each
application, we have contributed with novel methods from a
different perspective based on the use of coarse depth.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 61/64
79. Conclusions and future work
Future work
• Extend our approach to consider more monocular depth cues like
occlusions, relative and familiar size, that could improve our coarse
estimation.
• Explore other possible applications of depth information (tracking,
for initializing 3D reconstruction algorithms, learning pedestrians
classifiers according with depth, etc).
• Integrate our depth estimation method in different ADAS modules.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 62/64
80. Conclusions and future work
Publications
This thesis take as bases the following publications:
Conference papers
• Camera Egomotion Estimation in the ADAS Context, D. Cheda, D. Ponsa and
A. M. L´opez, IEEE Conf. Intell. Transp. Syst., 2010.
• Monocular Egomotion Estimation based on Image Matching, D. Cheda, D.
Ponsa and A. M. L´opez, Int. Conf. Pattern Recognit. Appl. and Methods, 2012.
• Monocular Depth-based Background Estimation, D. Cheda, D. Ponsa and A. M.
L´opez, Int. Conf. Comput. Vision Theory Appl., 2012.
• Pedestrian Candidates Generation using Monocular Cues, D. Cheda, D. Ponsa
and A. M. L´opez, IEEE Intell. Vehicles Symposium, 2012.
Journal papers under reviewing
• Monocular Multilayer Depth Segmentation and Applications, D. Cheda, D.
Ponsa and A. M. L´opez, submitted to IJCV, Springer.
• Monocular Visual Odometry Boosted by Monocular Depth Cues, D. Cheda, D.
Ponsa and A. M. L´opez, submitted to ITS, IEEE.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 63/64