ICPRAM 2012

Object Recognition In Probabilistic 3-D
Volumetric Scenes

Maria Isabel Restrepo
Brandon A. Mayer
Joseph L. Mundy

Goal: Automated Scene Description

Maria Isabel Restrepo. February 7, 2012 2

Goal: Automated Scene Description


Related Work : 3-d Object Retrieval
EC

120 R. Toldo, U. Castellani, and A. Fusiel
R
SH

Bronstein et al.
Hough Transforms and 3D SURF for robust three dimensional classiﬁcation 3
of transformations
pdif (x) = (log Kατ2 (x, x) − log Kατ1 (x, x), . . . ,
log Kατ m (x, x) − log Kατm−1 (x, x)),
ˆ
p(x) = |(F pdif (x))(ω1 , . . . , ωn )|, (3)
where F is the discrete Fourier transform, and ω1 , . . . , ωn denotes
a set of frequencies at which the transformed vector is sampled.
Taking differences of logarithms removes the scaling constant, and
the Fourier transform converts the scale-space shift into a complex

A. M. Bronstein, et.al.
phase, which is removed by taking the absolute value. Typically,
(a) (b)
J.Knopp, et.al.
(c)
a large m is used to make the representation insensitive to large R. Toldo, et.al.
Fig. 2. Illustration of the detection of 3D SURF features. The shape (a) is voxelized
2011
scaling factors and edge effects. Such a descriptor was dubbed
2010
into the cube grid (side of length 256) (b). 3D SURF features are detected and back-
projected to the shape (c), where detected features are represented as Kokkinoswith
Scale-Invariant HKS (SI-HKS) [Bronstein and spheres and 2010].
2009
the radius illustrating the feature scale.
3.3 Numerical Computation of HKS

crucial because it allows further tasks such as recognition,
navigation, and data compression to exploit contextual in-

Related Work: Scene Description In LIDAR
Thommen Korah formation. A keySwarup Medasani
contribution is our novel Strip Histogram
Yuri Owechko
Grid representation that encodes the scene as a grid of ver-
Nokia Research Center, Hollywood HRL Labs, Malibu
tical 3D population histograms rising up from the locally
{thommen.korah}@nokia.com {smedasani,yowechko}@hrl.com
detected ground. This scheme captures the nature of the
real world, thereby making segmentation tasks intuitive and
efficient. Our algorithms work across a large spectrum of
Abstract urban objects ranging from buildings and forested areas to
cars and other small street side objects. The methods have
As part of a large-scale 3D recognition system applied to areas spanning several kilometers in mul-
been for LI-
DAR data from urban scenes, we describe an tiple citiesfor data collected from both aerial and ground
approach with
sensors exhibiting different properties. We processed almost
segmenting millions of points into coherent regions that ide-
ally belong to a single real-world object. Segmentation is spanning an area of 3.3 km in less than an
a billion points 2

crucial because it allows further tasks such ashour on a regular desktop.
recognition,
navigation, and data compression to exploit contextual in-
formation. A key contribution is our novel Strip Histogram
Grid representation that encodes the scene as a grid of ver-
1. Introduction
tical 3D population histograms rising up from the locally
detected ground. This scheme captures the nature of the describes an approach for segmenting 3D ob-
This work
real world, thereby making segmentation tasksjects from high-resolution scans of complex urban environ-
intuitive and
efficient. Our algorithms work across a largements. Advances in sensor technology have enabled such
spectrum of
Object Detection from Large-Scale 3D Datasets
Light Standard buildings and forested areaspoint clouds to be routinely collected using both
urban objects ranging from colorized to 56
Figure 1: Top image is an input pointcloud for a 100x100
cars and other small street side objects. The methods have and airborne LIDAR platforms. The push
ground-based T. Korah, et.al. 2011
towards location-based services has increased demand for
been applied to areas spanning several kilometers in mul-
square meter tile color-mapped by height. Bottom shows
the result of segmentation. Each colored region ideally cor-
tiple cities with data collected from both aerial and ground digital maps of urban environments. The
highly accurate
responds to a physical object. This tile has over 3 million
sensors exhibiting different properties. We processed3D data contains millions of data points p = (x, y, z)
1 input almost
points.
a billion points spanning an area of 3.3 km2 in lessstore the spatial coordinates and possibly RGB color
that than an
hour on a regular desktop. 0.9
Car information. Segmentation can provide valuable contextual
information to Post
Short subsequent recognition or scene understand- linearly with the number of points. As a key part of our 3D
0.8 recognition system that demonstrated over 60% accuracy on
ing modules, making these tasks more efficient. Millions
Newspaper Box of 3D points need to be reduced to perceptually “mean- 40 classes, segmentation took less than an hour on a regular
1. Introduction 0.7
PC to process a collection of nearly 1 billion points.
ingful” groupings. To be effective for target recognition,
This work describes an approach0.6 segmenting Carob- disaster planning, processing must scale sub-
for simulation, or
3D Detailed geometric data at city-scales has not been pos-
jects from high-resolution scans of complex urban environ-
0.5
Traffic Light
ments. Advances in sensor technology have enabled such 74
Car
colorized point clouds to be routinely collected using both
0.4 Figure 1: Top image is an input pointcloud for a 100x100
(c) Zoomed0.4 view The push
ground-based and airborne LIDAR platforms. 0.6 0.8 1
square meter tile color-mapped by height. Bottom shows
towards location-based services has increased demand for
the result of segmentation. Each colored region ideally cor-
00 manually labeled objects et.al. truth area
in the
A. Golovinskiy,Left: The precision-recall curve for carand P. Mordohai,million points con
A. Patterson detection on 200 2008
highly accurate digital maps of urban environments. The
responds to a physical object. This tile has over 3 million
Fig. 6.
input 3D data contains millions of data points p = (x, y, z)
d points, with colors representing labels.) A points.
2009 1221 cars. (Precision is the x-axis and recall the y-axis.) Right: Screenshot o
that store the spatial coordinates and possibly RGB color
taining
information. Segmentation can provide valuable contextual
s on bottom, is shown in (c).(Automatically
information to subsequent recognition or scene understand- linearly with the number of points. As a key part of our 3D
detected cars. Cars are in random colors and the background in original colors.
ing modules, making these tasks more efficient. Millions recognition system that demonstrated over 60% accuracy on
Maria Isabel Restrepo. February 7, 2012
of 3D points need to be reduced to perceptually “mean- 40 classes, segmentation took less than an hour on a regular 5

Challenges Of Multi-View Stereo


Scene Ambiguity:


Scene Ambiguity:

Scene Uncertainty: 5

(a) (a) (b) (b)
(c)
(a) (c)
(d)
(b) (d)
(e)
(c) (d)

Probabilistic 3-d Volumetric Model: PVM
Probabilistic representation of 3-d scenes based on
volumetric units -voxel.

C

RX
I
IX
Voxel Volume!

V

S

X'
P(IX|V=X’)!

Intesity!

Pollard and Mundy, 2007


Probabilistic 3-d Volumetric Modeling

C

RX
I
IX
Voxel Volume!

V

S

X'
P(IX|V=X’)!

Intesity!



Surface probability is given by on-line Bayesian learning
pN (Ix +1 |X 2 S)
N
P N +1 (X 2 S|Ix +1 ) = P N (X 2 S)
N
pN (Ix +1 )
N

C

RX
I
IX
Voxel Volume!

V

S

X'
P(IX|V=X’)!

Intesity!


observed image intensity, as well the Gaussian mixture (1) at that voxel explains the intensity
to contain the observed surface observed in the N+1 image better than any other voxel along
usion. The process of updating the Probabilistic 3-d Volumetric Modeling
the projection ray.
pancy probabilities is explained in
pN (IX +1 |X 2 S)
N
Update using information along a projection ray
P N +1 (X 2 S) = P N (X 2 S)
p N (I N +1 )
(3)
X
e model X
pN (IX +1 |V = X 0 )P (V = X 0 |X 2 S)
N
voxel is modeledpwith N +1 |X 2 S)
N
(IX Gaussian
a
N N X 0 2RX
en P (X 2 S)
by (1). I, refers to the +1
N (I N
grey- = P (X 2 S) X
considered a vector pwith X )
various pN (IX +1 |V = X 0 )P N (V = X 0 )
N

X 0 2RX
or. The quantities, µk , k and !k ,
(4)
and mixing parameters associated
C
ution. W is the sum of !k for all To make the PVM representation clear, a term by term
R
is given by k; for this particular explanation of the update equation in 4 is outlined.
I
X

xture components. I X N N +1
• The term p (IX |V = X 0 ) is computed using the
Voxel Volume! !
1
(I µk )2 mixture of Gaussians model stored at the voxel X 0 .
2 2
p
2
exp k (1) • The probability of a voxel X producing the color in
0
2⇡ k V
the image is interpreted geometrically, where a voxel
mixture S learned using a modi-
are produces the intensity seen in the image if it is a surface
on (EM) algorithm similar to that element and it is not occluded by other voxels along the
X'
modeling [45]. The update of |V=X’)!
P(I
the X ray. Thus,
Intesity!
P N (V = X 0 ) = P N (X 0 2 S)P N (X 0 is not occluded) (5)

+1 The probability of occlusion is deﬁned as the probability
that all voxels between X 0 and the sensor are empty,10
! Maria Isabel Restrepo. February 7, 2012

observed image intensity, as well the Gaussian mixture (1) at that voxel explains the intensity
to contain the observed surface observed in the N+1 image better than any other voxel along
usion. The process of updating the Probabilistic 3-d Volumetric Modeling
the projection ray.
pancy probabilities is explained in
pN (IX +1 |X 2 S)
N
Every voxel contains appearance information
P N +1 (X 2 S) = P N (X 2 S)
p N (I N +1 )
(3)
X
e model X
pN (IX +1 |V = X 0 )P (V = X 0 |X 2 S)
N
voxel is modeledpwith N +1 |X 2 S)
N
(IX Gaussian
a
N N X 0 2RX
en P (X 2 S)
by (1). I, refers to the +1
N (I N
grey- = P (X 2 S) X
considered a vector pwith X )
various pN (IX +1 |V = X 0 )P N (V = X 0 )
N

X 0 2RX
or. The quantities, µk , k and !k ,
(4)
and mixing parameters associated
C
ution. W is the sum of !k for all To make the PVM representation of the a term by term
Probability clear, observed
R
is given by k; for this particular explanation of the update equation given that the
I
X
intensity, in 4 is outlined.
xture components. I
• The term p (IX voxels produced the color
X N N +1
|V = X 0 ) is computed using the
Voxel Volume! !
1
(I µk )2 mixture of Gaussians model the image voxel X 0 .
seen in stored at the
2
p
2
exp 2 k (1) • The probability of a voxel X producing the color in
0
2⇡ k V
the image is interpreted geometrically, where a voxel
3
!
X wk (I µk )2
mixture S learned using a modi-
are produces the intensity seen in1the image if it is a surface
2 2
on (EM) algorithm similar to that p e
element and it is not occluded by2other voxels along the
k
X' W 2⇡ k
modeling [45]. The update of |V=X’)!
P(I
the X ray. Thus, k=1
Intesity!
P N (V = X 0 ) = P N (X 0 2 S)P N (X 0 is not occluded) (5)

+1 The probability of occlusion is deﬁned as the probability
that all voxels between X 0 and the sensor are empty,11
! Maria Isabel Restrepo. February 7, 2012

ance model
observed image intensity, as well the N +1 Gaussian mixture (1) atNthat +1 (Iexplains 2 S) 0
N p (I N voxel = X 0|X (V = X |X 2 S)
p |V X )P the intensity
h voxel is modeled with asurface observed(X theS) = 0image better thanpany other)voxel along (
to contain the observed Gaussian P in 2 N+1 P (X 2 S) X
N (I N +1
usion. The process of updating grey- = P NProbabilistic
X 2RX 3-d Volumetric Modeling X
e model
given by (1). I, refers to the the the projection 2 S) XX N NN +1+1 (X ray.
pancy probabilities is explained in p p X X |V |V = X 0 )P N (V X 0 |X )2
(I (I N = X 0 )P (V = = X 0
oxelconsidered a vectorawith various
be is modeled with Gaussian
X 0 2R
color. The quantities, µk , k and !k , = P N (X 2 S) X 0 2RX X pN (I N +1 |X 2 S)
en by (1). I, refers to the grey- N +1 N X
e, and mixing parameters associatedP X
(X 2 S) = P (X 2 S)N (I N +1 |V+1 X 0 )P N (V = X 0 )
p N =
(3) (4)
considered a vector with various pN (IX )
X
ribution. W is the sum of !k for all To make the PVM representation clear, a term by term
or. modelquantities, µk , k and !k ,
e The X X 0 2RX
res is given by k; forN +1 particular explanation of the updateNequation X 0 )Pis outlined. 2 S) (
this pN (IX +1 |V = in 4 (V = X 0 |X
voxel is modeledpwith a associated
nd mixing parameters GaussianS)N
mixture components. X |X 2
N (I X0 N
N The term 2RX N +1 |V = X 0 ) is computed using the
tion. W is2the sum 2to the +1 all = PTo(X 2 S)the pPVM representation clear, a term 0 by ter
en P (X I,S)
by (1). refers of !N grey-
!
N (I k for •
make XX(I
pwith X ) mixture of Gaussians(IX +1 |Vstored 0at the voxel X 0 .
N
pN model = X )P N (V = X )
is given by ak; for this particular explanation of the0 update equation in 4 is outlined.
considered vector various
(I µk )
1 2 2
pThe quantities, µ ,
or. 2⇡ 2 exp (1) • The probability Xof a voxel X producing the color in
0
ture components. k k and !k ,
k X 2R
N N +1
k • The term pis (IX
the image |V =geometrically, where using th
interpreted X 0 ) is computed a (4)voxel
and mixing parameters associated ! C
e1 mixture are 2 2 (I µk )2
learned !k a all mixture ofthe Probability instored at termit by aterm0 .
produces Gaussians modelthat image if is X
intensity seen the a voxel
ution. W is the sum ofusingfor modi- To make the PVM representation clear, a the voxel surface
expby algorithm similar to(1) explanation of the updateof a occluded4by producing the color
• The probability not voxel X other voxels along the
0
element and it is equation color outlined.
produced the in is seen in
R
zation (EM) k; for this particularthat
2⇡ given
is 2
k X
I
nd modeling [45]. The update of the • The term pN (I Ninterpreted geometrically, where thevox
xturekcomponents. X I the image is +1
ray. Thus,
|V = X ) isthe image
0
computed using
a
Voxel Volume! !
ixture are learned using a modi-
(I µk )2
producesGaussians model stored N image if it is a surfa
N of the0 intensity seen in the the voxel X 0 .
X
mixture = X ) = P N (X 0 2 S)P at(X 0 is not occluded) (5
1 (EM) algorithm similar to that P (V
on
p exp 2 2
k (1) element and itof anot occluded by other the color in th
• The probability
is voxel X 0 producing voxels along
2⇡ 2
modeling [45]. The update of the the TheThus, interpreted geometrically, where a voxel
ray. probability of occlusion is defined as the probability
V
N +1 k image is
k that all voxels between0 XtheandNthe ifsensora are empty,
produces the intensity N
0
mixture S N +1
d! are learnedNusing a modi- Pnamely: X 0 ) = P seen in S)P (X 0 is is occluded)
N
(V = (X 2 image it not surface
(I
on (EM) algorithm X '
N
µk similar to that (2)
) element and it is not occluded by other voxels along the
! + !k Y
ray. Thus, 0
The (X is not occluded) =
N probability of occlusion is defined N the probabili
modeling [45]. The update of the (1 P as 00 2 S)) (6
P(I |V=X’)!
1 d!
X
P (X
N +1 N Intesity!
2 N 2
P N (V all X 0 ) = P N (X 0 2 S)P00Nand0 the sensor are (5)
that = voxels between X <X 0 is not occluded) empt
(I µk ) ( k) 0
d! + N k !+1 N X (X
(I µN )
k (2) namely:
• The term P (V = X |X 2 S) is computed analogously
N N 0
ng
!k weight, d!, upon observing image
+1 The probability of0 occlusion is defined as the probability
Y
P N P (V = between X= and instances of P empty, S)
toall voxels X ). However, anythe (1 P N (XN (X S))
N0 00
d!analyzing N +1 distributions in other
the N 2 N 2 that (X is not occluded) 0 sensor are 2 212
! Maria Isabel Restrepo. February 7,µ )
(I 2012 ( )

Spatial Optimization: Octree

empty space

surface



empty space

surface



p(intensity)
p(intensity)
intensity

intensity

Crispell, Mundy and Taubin 2011
Miller, Jain and Mundy 2011



Demo:
https://vimeo.com/43729866


Geometry And Appearance

Demo:


Expected Appearance Volume Model: EVM

Voxel’s Expected =
E(IX |V = X )P (X 2 S)
0 0
Appearance


Object Categorization: Bag Of Volumetric Words

Parking
Car
Plane

Building
House

Input: Feature Descriptor: Volumetric Classiﬁer:
EVM sampling: Taylor Vocabulary: Naive Bayes
Dense PCA K-means


Experiments: Data Collection

http://vision.lems.brown.edu/project_desc/Object-Recognition-in-Probabilistic-3D-Scenes


Experiments: Train And Test Sites

Site 1 Site 2 Site 3 Site 5 Site 6



Site 25 Site 26 Site 27

http://vision.lems.brown.edu/project_desc/Object-Recognition-in-Probabilistic-3D-Scenes

Experiments: The Input

Camera matrices were recovered using Bundler: Snavely, N. and Seitz, S. (2006). Photo tourism: exploring photo collections in
3D. ACM Transactions on Graphics.

Feature Description
394 D. Saupe and D.V. Vrani´
c
Global Features

Spherical Harmonics: D. Saupe and D. V. Vrani, 2001
Original 823-d Zernik Moments: M. Novotnia and R. Klein, 2003
harmonics 162 harmonics 242 harmonics
Transforms and 3D IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 5, 3 Regional Point Descriptors
SURF for robust Recognizing Objects in Range Data Using 1999
Fig. 1. Multi-resolution representationFeatures AND MACHINE INTELLIGENCE,| VOL. 21, NO. 5, MA
227
IEEE TRANSACTIONS ON PATTERN ANALYSIS r(u) = max{r ≥ 0
Local of the function
three dimensional classification ru ∈MAY

I ∪{0}} used to derive feature vectors frommakes thecoefficients for spherical harmonics.
Sampling logarithmically Fourier descriptor
more robust to distortions in shape with distance
from the basis point. Bins closer to the center are
smaller in all three spherical dimensions, so we use
a minimum radius (rmin > 0) to avoid being overly
3 Functions on the Sphere for 3D Shape Feature Vectors
sensitive to small differences in shape very close
to the center. The Θ and Φ divisions are evenly
In this section we describe the feature vectors used in our comparative study. As
spaced along the 180◦ and 360◦ elevation and az-
onents of our surface representation. A surface described by a polygonal surface mesh can be represented for matching as a set of
and surface normals and (b) spin images.
imuth ranges.
3
3D models we take triangle meshes consisting of triangles {T , . . . , T }, T ⊂ R ,
enes is difficult. The usual method for(b)
the cency. Given enough points, weighted count w(pi )
ating object-centered coordinate systems inBin(j, k, l) accumulates a any object can be represented
dealing by points sensed on the object Spin so surface meshes
1
surface, Images: Johnson and Hebert, 1999
m i
(a) for each point pi whose spherical coordinates rela-
(c) 3
given by vertices (geometry) {p , . . . , p }, p = (x , y , z ) ∈ R and an index
r is to segment the scene into object and non- can represent objects of general shape. Surface meshes can
ponents [1], [7]; naturally, this is difficult if the pbe generated from described n a polygonal Rdo i ), SURF: Knopp, et.al. 2010
3-D
tive to fall A surface1 different types i [R andi j+1 i Fig. 2. Visualization for matching as
. 1. Components of our surface representation. within the radius by of sensors j , surface mesh can be represented of the
interval not
ustrationisandthe detection ofand (b) spin images. [Φ The shape (a)elevation interval histogram bins ofmthe 3D
the object
table with three vertices per triangle (topology). Then our object is I =
unknown. An alternative to seg-3D SURF features. Φ
T,
3D points of surface normals azimuth generally contain,sensor-specific information; they are sen- is voxelized
interval k representations. The useShape mesh
k+1 ) and
ube grid (side of length coordinate sys- SURF features are detected and back- Context: context. et.al. 2004
is to construct object-centered 256) (b). 3D sor-independent of surface
shape
Frome, i=1 i
local features detected in the scene [Θl ,[18]; as representations for 3D shapes thebeen avoided in for
[9], Θl+1 ). The contribution to has bin count the
to the shape (c), where 2012
Maria Isabel Restrepo. February 7, detected features are represented as spheres and with 23

Feature Formation
Volumetric Form of Vector Form of Voxel
Voxel Neighborhoods Neighborhoods

E(IX |V = X )P (X 2 S) 0 0

24


ity leaf nodes contain the Gaussian mixture models

) Feature Description: PCA Features
(c)
tree subdivision of space proposed by Crispell [20]. S. In the PCA spac
by the eigenvalue decomposition of
1-dimensional space d-dimensional space
neighborhood (represented by a d-dimensional featur
a1 ⇧d
e1
x) can be exactly expressed as x = x + i=1 ai ei , w
¯
by theprincipal axes associated S. In the PCAeigenvalues,
are eigenvalue decomposition of with the d space, every
neighborhood x ⇡ x + a1 e1 by a d-dimensional feature vector
(represented
are the corresponding coefficients.⇧d k-dimensional
¯
A a e , where e
x) can be exactly expressed as x = x + i=1 i i
¯ i
approximation of the neighborhoodseigenvalues, and ai b
are principal axes associated with the d can be obtained ⇧k
the first k on the samplecomponents i.e.k-dimensional
EVD principal ˜ ¯
x = x + i=
are the corresponding coefficients. A k-dimensional (k < d)
approximation, for k<d
scatter matrix S a detailed analysis of the recons
approximationpresents
Section V of the neighborhoods can be obtained by using
⇧k 2
the firstof local neighborhoods,i.e. x = x + x| , ai ei a.
error k principal components namely ¯ ˜ |x ˜i=1 as
Section V presents a detailed analysis of In the remainder
of dimension and training set size. 2 the reconstruction
error of localvector arrangement of |x x| , as coefficien
paper, the neighborhoods, namely projection a function
˜
of dimension and training set size. In the remainder of this
PCA the vector arrangement of projection coefficients in the
paper,
space is referred to as a PCA feature.

on
ni
⌃ ⌃ ⌃ nj nk ⇥2
es, as the computation of derivatives in (i, j,expectationj,volume m
V the k) Taylor Features
˜
E= Feature Description: V (i, k)
(5) EVM, i= ni j= nj k= nk a least square error minimiz
can be expressed as
of the following energy function.
’s expected ˜
Where V (i, j, k) is the Taylor series approximation of
ni nj nk
⌃ ⌃ ⌃ a volume V centered on2 the ⇥
nces, Minimize: E =3-d appearance of V (i, j, k) V (i, j, k)
as the expected ˜
identify- point (i, j, k). Using the second degree Taylor expansion o
i= ni j= nj k= nk
st of the about (0, 0, 0), ( 6) becomes
is (PCA) ˜ ⇤
Where V (i, j, k) is the Taylor series approximation o
⌅2
⌃
epresents expected 3-d appearance of axvolume 1 xT Hx
E= V (x) V0 T
G V centered on th
by identify- point (i, j, x Using the second degree Taylor expansion
or sense. k). 2!
most ofof
order the about (0, 0, 0), ( 6) becomes
ysis (PCA) Where V0 , G, H are the zeroth derivative, the grad
e scatter ⌃ ⇤ ⌅2
represents vector and the Hessian matrix of the 1 T
T volume of expe
E= V (x) V0 x G x Hx
error sense. 3-d appearances about the point (0, 0, 0), respectively.
2!
obtained coefﬁcients for 3-d derivative operators can be found by
x
he octree of imizing (7) withG, H are the zeroth derivative,second o
ng order
Where V0 , respect to the zeroth, ﬁrst and the gra
mple scatter
aces and derivatives. The computedmatrix of the volume are exp
vector and the Hessian derivative operators of app
location, algebraically to neighborhoods in the 0, 0), respectively.
3-d appearances about the point (0, EVM. The respo
re obtained

Learning The Codebook

Learn Volumetric Vocabulary using K-Means Clustering:
✤ Determine the best number of means: Heuristically
✤ Convergence depends on initialization: P. S. Bradley
and U. M. Fayyad. 1998


Vocabulary: Twenty Volumetric Words

PCA based

Taylor based


ssiﬁcation, the class label with i=1 i
414
ep a count the number of cluster centerstheLearning Class Distributions
is obtained, cij , of in the vocabulary.of the 405
number From 413

obability isachosen v , ominimizeUsing Bayes 415
ion meth-
center, vi , times cluster center,
proposed
tooccursUsing Bayes
quantization step a count is obtained, c , of the number of
occurs in object j . in object o . ij
414
415
he means.
i
posteriori class probability class probability is given by:
formula, the a posteriori is given by:
j
406
416416
f the data 417
417
of a particular category be the
The clus- P (Cl |oi ) ⇥ P (oi |Cl )P (Cl )
(8)
(Cl |oi ) ⇥ P (oi likelihood(Clan object is given by the product of
|Cl )P of )
(8) 407418
k-means, The 418419
frequency

is the class label and N is the
distances the likelihoods of the independent entries of the vocabulary, 408
419
420
the initialan object ),is given estimated l product ofThe full
od of P (vj |Cl which are by the during learning. 421

s label l. Then, the set of all
e manage-independent entries posterior becomes:
of the expression for the class of the vocabulary, 409
420422

⌥
f subsam-
d k-means estimated )during learning. )The full
ch are Nc (C |o ⇥ P (C ) P (v |C cji k 421
423

O= O , where N is the
P l i (9) 410424
meansclass posteriorlbecomes: c
l j l
the pro- l=1 j=1 422425

he vocabulary of 3-d expected
etric train- Nm
⇥cji 411
423
426
ng parallel
are avail-⌥ k
k
⇧
cji k ⇧ m=1:om O
cjm ⌃
⌃ 412
424
427

ed as V = v , where k is
428
⇥ not be l )
P (C P (vj |ClP (Cl= ⇧ k l (9) ⌃
uld ⇥ ) i ) ⇧ ⌃ (10) 429
i=1 ⇧ Nm ⌃ 425
Therefore,
s in the vocabulary. From the
j=1 j=1 ⇤
⇥ ji
cnm ⌅ 413430
which is a n=1 m=1:om cOl 426431
N m
obtained, c⇧ ,: number of times accluster i⌃ in object j 414
ij of the number occurs of 427
4 k jm
⇧ ⌃ 415
428
curs in object o . Using Bayes
⇧ m=1:o O
Maria Isabel Restrepo. February 7, 2012 ⌃m l 29

appearance patterns be defined as V = i=1 vi , where k is
the ⌥N c Then, the set of Bayes the 409
withnumber of clusterl.centers in the vocabulary. all Classifier 4
class label Classification: From
efined as O = l=1 is l , where , of is number of 4104
quantization step a count Oobtained, cijNc thethe
times a cluster center, vi , occurs in object oj . Using Bayes 4114
es. Let the vocabulary of 3-d expected 4
⌥k
formula, the a posteriori class probability is given by:
4124
be defined as V )= P (o |C vi ,(C ) where k is (8)
frequency

P (Cl |oi ⇥ i=1 l )P l
i
er centers in the vocabulary. From the 4134
The likelihood of an object is given by the product of 4
count is obtained, cij , of the number of 414
the likelihoods of the independent entries of the vocabulary, 4
er, (vj |Coccurs in object oj during learning. The full
P vi , l ), which are estimated . Using Bayes 4154
expression for the class posterior becomes:
eriori class probability is given by: 4164
4
k 4174
oi )P⇥ P (oi |Cl )P= l ) l )
(C
(Cl |oi ) ⇥ P (C P (vj |Cl ) cji
(8) (9)
4184
j=1
⇥cji 4194
of an object is given by the product of
N m
4
⇧ the vocabulary,⌃
e independent entries of
k ⇧
cjm 4204
⌃
⇧ m=1:om Ol ⌃
Maria Isabel Restrepo. February 7, 2012 ⇥ P (C ) ⇧ ⌃ (10) 30

withnumber of clusterl.centers in theLearning of all the 4094
the class label ⌥N c Then, the set Class Distributions
vocabulary. From
⌥k
4124
frequency

i
4
k 4174
oi )P⇥ P (oi |Cl )P= l ) l )
(C
(8) (9)
Train 4184
j=1
⇥cji 4194
N m
4
k ⇧
cjm 4204
⌃
⇧ m=1:om Ol ⌃

withnumber of clusterl.centers in theLearning of all the 4094
the class label ⌥N c Then, the set Class Distributions
vocabulary. From
⌥k
4124
frequency

i
4
k
Test 4174
oi )P⇥ P (oi |Cl )P= l ) l )
(C
(8) (9)
Train 4184
j=1
⇥cji 4194
N m
4
k ⇧
cjm 4204
⌃
⇧ m=1:om Ol ⌃

Results: PCA Classes
Buildings

Planes


Results: Taylor Classes
Buildings

Planes


during training and classification.
Experiments: Number Of Objects
Table 2: Number of objects in every category.

Planes Cars Houses Buildings Parking Lots
Train 18 54 61 24 27
Test 16 29 45 15 17

Two measurements were used to evaluate the clas-
sification performance: (i) classifier accuracy (i.e the
fraction of correctly classified objects), and (ii) the
confusion matrix. During classification experiments,
the number of clusters in the codebook was varied
from k = 2 to k = 100. Figure 4 presents classification
accuracy as a function of the number of clusters. For
18 Probabilistic Sites
both, Taylor-based features and PCA-based features,

Results: Classiﬁcation Accuracy


row corresponds to those learned with Taylor-based features. The x-axis shows the
feature. The most probable volumetric featuresResults:class are shown Matrix
for each Confusion beside each

was
True Parking
Class
Plane House Building Car
Lot
True
Class
Plane House Building Car
Parking
Lot very
are
Plane 0.86 0.02 0.00 0.03 0.00 Plane 0.86 0.02 0.00 0.03 0.00
neg
House 0.00 0.67 0.27 0.00 0.12 House 0.00 0.64 0.27 0.00 0.12
that
not
Building 0.00 0.31 0.67 0.00 0.00 Building 0.00 0.33 0.67 0.00 0.00 num
⇤, i
Car 0.00 0.00 0.07 0.93 0.00 0.00 0.00 0.07 0.86 0.00
Car
F
Parking
0.14 0.00 0.00 0.03 0.88
Parking
0.14 0.00 0.00 0.10 0.88
mat
Lot Lot
sam
(a) PCA (b) Taylor vari
Fig. 9. Confusion matrix for a 20-keyword codebook of PCA based features valu
on the left and Taylor based features on the right clas
Maria Isabel Restrepo. February 7, 2012 cate
37

Future Work

✴ Evaluation of effectiveness of the EVM, by performing classiﬁcation
tasks on different underlying 3-d reconstruction algorithms.

✴ Performance evaluation of additional feature descriptors.

✴ Explore algorithms for detection.


Effectiveness Of Probabilistic Volumetric Learning

Y. Furukawa and J. Ponce, 2010 39


Probabilistic 3-d Modeling Threshold Based 3-d Modeling

ICPRAM 2012

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie ICPRAM 2012

Ähnlich wie ICPRAM 2012 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ICPRAM 2012

Hinweis der Redaktion