1) The document presents a graphical model framework for improving geo-tagging location estimation by leveraging correlations between images with common tags.
2) It models locations as random variables and introduces a joint probability distribution over all locations given all tags to capture these correlations, in contrast to traditional approaches that only consider individual tag-location distributions.
3) The joint distribution is approximated using a pairwise potential function indicating whether two locations are the same, given they share at least one common tag. An iterative belief propagation algorithm is used to estimate the optimal marginal distributions.
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
The 2012 ICSI/Berkeley Video Location Estimation System
1. The 2012 ICSI / Berkeley
Location Estimation System
Jaeyoung Choi,Venkatesan Ekambaram,
Gerald Friedland and Kannan Ramchandran
ICSI / UC Berkeley, USA
October 4th, 2012
Thursday, October 4, 12 1
2. Agenda
• Baseline Approach
• Drawbacks
• Graphical Model Framework
• Result
Thursday, October 4, 12 2
3. Baseline Approach
• Investigate ‘Spatial Variance’ of feature:
• spatial variance is small : feature is likely
location-indicative
• spatial variance is large : feature is likely
not indicative
Thursday, October 4, 12 3
4. Example
Tag Matches in Spatial Variance
Training set
pavement 2 5.739
ucberkeley 4 0.132
berkeley 14 68.138
greek 0 N/A
greektheatre 0 N/A
spitonastranger 0 N/A
live 91 6453.109
video 2967 6735.844
Thursday, October 4, 12 4
5. Problem:
Sparsity coming from biased dataset
Thursday, October 4, 12 5
9. Interpre-ng
tradi-onal
approaches
Loca-ons
are
random
variables: {x1 , x2 , ....., xN }
Probability
of
loca-on
given
tags
Y
Tradi-onal
approaches
es-mate: k k
p(xi |{ti }) p(xi |ti )
k
where k is
obtained
from
the
training
set
p(xi |ti )
Thursday, October 4, 12 8
10. Interpre-ng
tradi-onal
approaches
Loca-ons
are
random
variables: {x1 , x2 , ....., xN }
Probability
of
loca-on
given
tags
Y
Tradi-onal
approaches
es-mate: k k
p(xi |{ti }) p(xi |ti )
k
where k is
obtained
from
the
training
set
p(xi |ti )
Example:
the
distribu-on
for
the
tag
“washington”
is
depicted
here
Thursday, October 4, 12 8
11. Interpre-ng
tradi-onal
approaches
Loca-ons
are
random
variables: {x1 , x2 , ....., xN }
Probability
of
loca-on
given
tags
Y
Tradi-onal
approaches
es-mate: k k
p(xi |{ti }) p(xi |ti )
k
where k is
obtained
from
the
training
set
p(xi |ti )
Example:
the
distribu-on
for
the
tag
“washington”
is
depicted
here
Z
Loca-on
es-mate: k
xi p(xi |{ti })dxi
Thursday, October 4, 12 8
12. Drawbacks
Data
sparsity:
Not
all
tags
in
test
set
are
available
in
training
set.
Hence
es-mate
of
i
|tk
)can
be
bad
p(x
i
Sub-‐op(mality:
The
approaches
are
subop-mal
given
the
data.
What
we
ideally
want: k k k
p(x1 , x2 , ....., xN |{t1 }, {t2 }, ..., {tN })
Mean
of
the
above
distribu-on
gives
the
best
es-mate
of
the
loca-ons
i.e.
for
each
image
we
want k k k
p(xi |{t1 }, {t2 }, ...., {tN })
Tradi-onal
algorithms
only
give: k
p(xi |{ti })
Thursday, October 4, 12 9
13. Bayesian
graphical
framework
{berkeley,
sathergate,
{berkeley,
haas}
campanile}
Edge:
Correlated
loca-ons
(e.g.
common
tag)
Node:
Geoloca-on
of
the
image
k p(xj |{tk })
p(xi |{ti }) j
p(xi , xj |{tk }
i {tk })
j
{campanile} {campanile,
haas}
Edge
Poten(al:
Strength
of
an
edge,
(e.g.
posterior
distribu-on
of
loca-ons
given
common
tags)
Thursday, October 4, 12 10
14. Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have
correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on
Thursday, October 4, 12 11
15. Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have
correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on
Joint
probability
modeling:
Y Y
p(x1 , x2 , ....., xN |{tk }, {tk }, ..., {tk })
1 2 N p(xi |{tk })
i p(xi , xj |{tk } ⇥ {tk })
i j
i (i,j)
Pairwise
distribu-on
given
at
least
one
common
tag
Thursday, October 4, 12 11
16. Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have
correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on
Joint
probability
modeling:
Y Y
p(x1 , x2 , ....., xN |{tk }, {tk }, ..., {tk })
1 2 N p(xi |{tk })
i p(xi , xj |{tk } ⇥ {tk })
i j
i (i,j)
Pairwise
distribu-on
given
at
least
one
common
tag
k
p(xi |{ti }) is
obtained
from
the
training
set
as
before
p(xi , xj |{tk }
i {tk }) Modeled
as
an
indicator
func-on
j I(xi = xj )
If
the
common
tag
has
low
spa-al
variance
or
occurs
infrequently,
e.g.
if
the
common
tag
is
“haas”,
its
very
likely
the
loca-ons
are
the
same
Thursday, October 4, 12 11
17. Coopera-ve
geo-‐tagging
Intui-on:
Images
in
the
training
set
having
common
tags
have
correlated
geo-‐loca-ons
captured
by
the
joint
distribu-on
Joint
probability
modeling:
Y Y
p(x1 , x2 , ....., xN |{tk }, {tk }, ..., {tk })
1 2 N p(xi |{tk })
i p(xi , xj |{tk } ⇥ {tk })
i j
i (i,j)
Pairwise
distribu-on
given
at
least
one
common
tag
k
p(xi |{ti }) is
obtained
from
the
training
set
as
before
p(xi , xj |{tk }
i {tk }) Modeled
as
an
indicator
func-on
j I(xi = xj )
If
the
common
tag
has
low
spa-al
variance
or
occurs
infrequently,
e.g.
if
the
common
tag
is
“haas”,
its
very
likely
the
loca-ons
are
the
same
Ques-on: How
to
es-mate
to
op-mal
marginal
distribu-on
?
k k k
p(xi |{t1 }, {t2 }, ...., {tN })
Thursday, October 4, 12 11
18. Belief
propaga-on
updates
Itera-ve
algorithm
to
approximate
k k k
p(xi |{t1 }, {t2 }, ...., {tN })
the
posterior
distribu-on
k 2
Gaussian
modeling p(xi |{ti }) N (µi , i)
2
At
itera-on
0
each
node
calculates (µi , i)
1 (t 1) P 1(t)
(t) 2 µi + k⇥N (i) ( (t) )2 µk
(t) ( i ) k
µi = (t) 2
At
itera-on
t
each
node
updates
( i )
its
loca-on
as
a
weighted
mean
of
its
previous
loca-on
and
that
of
its
1 1 X 1
neighbors (t) 2
= (t 1) 2
+ (t 1) 2
( i ) ( i ) k2i ( k )
The
weights
reflect
the
confidence
in
that
measurements,
i.e.
higher
the
spa-al
variance
lower
is
the
weight
Thursday, October 4, 12 12
19. Belief
propaga-on
2
(µ2 , 2)
Posterior
mean
and
variance
2
(µ3 , 3) assuming
Gaussian
beliefs
2
(µ1 , 1)
Audio
visual
features
are
incorporated
in
modeling
the
edge
and
node
poten-als
Thursday, October 4, 12 13
20. Incorpora-ng
Audio-‐Visual
features
• GIST
features
are
extracted
for
the
images.
• MFCC
features
are
extracted
for
the
audio.
• These
are
now
incorporated
into
the
node
and
edge
poten-als
as
exponen-al
distribu-ons.
||xi xj ||
p(xi , xj |ai , aj ) ⇥ exp( )
||ai aj ||
ai are
the
audio
features
associated
with
image
i
The
intui-on
is
that
closer
the
audio
features
are,
higher
the
probability
that
the
geo-‐loca-ons
are
closer.
Similarly
this
can
be
included
in
the
node
poten-als
as
well
as
for
the
visual
features.
Thursday, October 4, 12 14
21. Result
• Percentage of test videos (out of 4182 videos)
correctly
es-mated
under
distances
in
the
top
row
from
the
groundtruth
loca-on.
– run1
-‐
baseline
approach
without
using
gaze_eer
– run2
-‐
graphical
model
based
approach
with
gaze_eer
– run3
-‐
baseline
approach
with
gaze_eer
– run4
-‐
k-‐NN
with
gist
visual
feature
• Graphical
model
approach
with
gaze_eer
outperforms
baseline
approaches
in
range
above
1km.
14
Thursday, October 4, 12 15
22. Conclusion
• graphical
model
framework
can
achieve
performance
improvement
over
baseline
approach
by
incorpora-ng
results
from
test
data
• various
issues
remain
to
be
explored
–
the
modeling
of
edge
poten-al
• text
:
hard
threshold
(current)
-‐-‐>
sod
• visual/audio
features
–
assump-on
of
condi-onal
independence
of
loca-on
distribu-on
given
mul-ple
tags
15
Thursday, October 4, 12 16
23. Thank You!
Questions?
http://mmle.icsi.berkeley.edu
Work together with:
Venkatesan Ekambaram, Kannan
Ramchandran, Giulia Fanti
Howard Lei, Adam Janin, and Gerald
Friedland 16
Thursday, October 4, 12 17