SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
Improving Spatiotemporal
Stability for Object Detection
and Classification
Albert Y. C. Chen, Ph.D.
Computer Scientist @ Tandent Vision
2015/03/27
Videos, lots of them.
0
20
40
60
80
2007 2008 2009 2010 2011 2012
Hours of videos uploaded toYoutube every minute
Goal: automatically analyze,
organize, and archive videos.
Typical Approaches:
Classifiers, classifiers, classifiers
•Video nouns, e.g., sky, tree, building, car, etc.
•Video noun structures, e.g., horizontal flat surfaces,
vertical surfaces, non-support surfaces, etc.
•Video verbs, e.g., diving, bench press, punch.
Results are far from perfect
for example, in
Joint Segmentation and Classification
(multiple semantic class pixel labeling)
Example annotations
Object segmentation Class segmentation
Difficult
objects
masked
Image
Example annotations
Object segmentation Class segmentationImage
State-of-the-art results from
PascalVOC 2012
Segmentation Challenge
Example segmentations
Image Ground truth
NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
Example segmentations
Image Ground truth
NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
Example segmentations
Image Ground truth
NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
Example Segmentations
Image Ground truth
NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
Apply these object classifiers
to videos, frame by frame?
Input
frame
Ground
truth
labels
2D MRF
results
00001TP_008820 00001TP_008850
VGS
results
00001TP_008880Name
Markov Random Field (MRF) for
modeling Spatiotemporal Priors
spatial
hidden
labels
observed
noisy
labels
temporal
first order spatial
neighborhood
higher order spatial
neighborhood
temporal
neighborhood
Generic MRF Formulation for
classification taks
E2 (mµ, m⌫) = 1 (mµ, m⌫)
E [{mµ : µ 2 G}]
=
X
µ2G
E1 (I (S [µ]) , mµ) +
X
hµ,⌫i
E2 (mµ, m⌫)
E1 (I (S [µ]) , mµ) = log P
⇣
mµ I (S [µ])
⌘
Major technical contributions, MRF
for modeling Spatiotemporal Priors
Name Application Description
Bilayer MRF
Video Label
Propagation
An additional layer of hidden
variables to model the motion
v.s. appearance model weights.
Higher Order
Proxy
Neighborhood
Joint segmentation
and classification
Longer range spatial
smoothness with traditional 1st
order neighborhood.
Video Graph-
Shifts
Joint segmentation
and classification
in videos
Simultaneously estimate the
motion priors while doing
multiple semantic class labeling.
Subproblem 1
Bootstrapping the Classifier
Training process by using
Hierarchical Supervoxels
The inconsistent and time
consuming task of pixel labeling
Seq05VD_f02400Seq05VD_f02370Seq05VD_f02340
inputfram
e
sem
antic
object
label
roadsidewalk
sign
From the CambridgeVideo Driving Dataset
Video pixel label propagation
FG
Traditional Spatial
Propagation
Pixel label map
Label a subset of pixels
BG
Spatio-temporal Propagation
time
Optical Flow should do it?
Bidirectional optical flow frame 20
Black & Anadan Classic+NL
Bidirectional optical flow frame 60
Black & Anadan Classic+NL
Maybe a different optical flow
algorithm?
Why optical flow alone fails
a hole occurs the dragging effect
Forward Flow Reverse Flow
multiple
incoming
flows
t t+1 t t+1
Train a appearance model on
the user annotated frame?
0
10
20
30
40
50
60
70
80
90
100
1 11 21 31 41 51 61 71
!"#$%&#'(#$)*#&+,-'!../$%.0'1"#$'23#'4#5/#-.#'
X:do-nothing
M:forward-flow
A:patch
Try again?
Motion-only Propagation Appearance-only Propagation50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
90.00
95.00
100.00
1 11 21 31 41 51 61 71 81
!"#$%&#'(#$)*#&+,-'!../$%.0'1"#$'23#'4#5/#-.#'
X:do-nothing
M:forward-flow
A:patch
Maybe we should do
something like this?
app.
app.
flow
flow
both
both
both
both
flow
app.
Turns out to be an optical flow
reliability estimation problem
How good is our Motion vs
Appearance (MvA) weights?
40
80
o. flow only
The Container Sequence
input image GT label app. onlyour method
40
80
input image GT label our method o. flow only app. only
The Garden Sequence
Well, there’s still problems-1
0.4
0.5
0.6
0.7
0.8
0.9
1
1 11 21 31 41 51 61 71
How to Weigh between Mot and App?  
Fixed weight for all pixel
Naïve cross-correlation
Occlusion-aware cross corr.
Bidirectional flow consistency
Well, there’s still problems-2
Initial Noisy WvA
weight map
Optimized WvA map
with our bilyaer MRF
bussoccer
Target frame
for propagation Ground Truth Label
Our bilayer MRF for Label
Propagation
Observed
noisy values
(Hidden true pixel labels)
(Hidden true WvA weights)
1st layer of MRF
2nd layer of MRF
label change at
causes to change
as well as causing the WvA
layer's energy to change
Our proposed Bilayer MRF for
Video Pixel Label Pixel Label Propagatoin
Results
frame 1 frame 75frame 50frame 25
stefan
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 11 21 31 41 51 61 71
Stefan (tenis) Sequence 
Appearance uni-model
Appearance multi-model
Do nothing
Bidirectional flow
Bad (fixed) WvA weights
Our method
Results
soccer
frame 1 frame 75frame 50frame 25
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 11 21 31 41 51 61
Soccer Sequence 
Appearance uni-model
Appearance multi-model
Do nothing
Bidirectional flow
Our method
Bad WvA weights
Results
bus
frame 1 frame 75frame 50frame 25
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 11 21 31 41 51 61 71
Bus Sequence 
Do nothing
Bidirectional flow
Appearance uni-model
Bad (fixed) MvA weight
Our method
Good, but far from perfect
• Overall accuracy still low
• Object Boundaries crossed
• Optical flow reliability estimation still noisy
Hierarchical Supervoxel Fusion
(HSF) for Pixel Labeling
input video:
supervoxel
hierarchy
self-augmented
appearance model:
supervoxel flow:
classifier
HSF-based
Pixel-label
Propagation
What does HSF buy us?
• 100x more data for the appearance model.
• Supervoxel-level correspondences instead
of just pixel-level optical flow.
• State-of-the-art pixel label propagation
performance.
Supervoxel Hierarchy and the
“right scale”
The HSF Process
y
Hierarchical Supervoxel
Fusion
x
t
Label Consistency Maps
Supervoxel
Hierarchy
y
x
t
vehicle flower tree
y
x
t
input video:
Automatic Selection of the
Maximum Hierarchy Height
Soccer
grass tree face sign dog body
66x 83x 14x 28x 15x 62x
Stefan
grass face sign chair body
83x 1x 75x 1x 83x
Camvid
bldg tree/grasssky road pavemt. concr. roadmk.
6x 25x 1170x 176x 76x 20x 1756x
Table 3.1: Increase in training set size of the self-augmented training set (done
through Hierarchical Supervoxel Fusion) over the original training set.
Seq Lv 6 7 8 9 10 11 12 13 14 15
Bus 4.22% 6.11% 8.93% 9.44% 10.71% 18.57% 22.00% 27.55% 35.96% 47.36%
Container 0.08% 0.07% 0.16% 0.44% 0.86% 2.37% 3.28% 6.69% 14.11% 21.75%
Garden 0.83% 1.74% 2.66% 3.90% 6.21% 11.37% 20.12% 29.74% 30.43% 50.68%
Ice 0.11% 0.28% 0.89% 1.54% 1.99% 2.21% 2.32% 2.32% 2.41% 27.04%
Paris 0.38% 0.46% 0.73% 1.30% 2.02% 3.68% 9.02% 9.48% 11.32% 13.93%
Salesman 0.31% 0.46% 0.66% 1.58% 4.00% 7.18% 10.23% 20.99% 24.17% 25.01%
Soccer 0.29% 0.49% 0.61% 1.31% 1.57% 1.70% 5.43% 19.12% 33.89% 38.57%
Stefan 0.42% 0.74% 1.10% 1.38% 1.69% 1.91% 2.45% 3.97% 6.73% 39.70%
Camvid 1.72% 3.55% 6.23% 7.51% 11.06% 18.45% 25.84%
Table 3.2: Automatic Hierarchy Height Selection by computing the Supervoxel
Boundary Error on the user annotated frame. The shaded levels are discarded since
too many of the supervoxels violate the user-defined boundaries.
Supervoxel boundary error on the user annotated frame.
The Self-augmented
Appearance Model
Bus
tree horse car flower sign road
24x 3x 48x 33x 8x 18x
Container
bldg grass tree sky water road boat
91x 109x 93x 100x 90x 116x 89x
Garden
bldg tree sky flower
96x 54x 31x 60x
Ice
face sign road body
37x 22x 89x 65x
Paris
tree face book body
113x 127x 105x 44x
Salesman
tree face book
111x 102x 84x
Soccer
grass tree face sign dog body
66x 83x 14x 28x 15x 62x
Stefan
grass face sign chair body
83x 1x 75x 1x 83x
Camvid
bldg tree/grasssky road pavemt. concr. roadmk.
6x 25x 1170x 176x 76x 20x 1756x
Table 3.1: Increase in training set size of the self-augmented training set (done
through Hierarchical Supervoxel Fusion) over the original training set.
Seq Lv 6 7 8 9 10 11 12 13 14 15
Bus 4.22% 6.11% 8.93% 9.44% 10.71% 18.57% 22.00% 27.55% 35.96% 47.36%
Container 0.08% 0.07% 0.16% 0.44% 0.86% 2.37% 3.28% 6.69% 14.11% 21.75%
Increase in the number of pixels available for
training the appearance model.
0.2$
0.3$
0.4$
0.5$
0.6$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$
Bus$
Our$AG0SV$ OR0PA$ OR0SP$ OR0MM$
0.6$
0.65$
0.7$
0.75$
0.8$
0.85$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$
Container)
Our$AG1SV$ OR1PA$ OR1SP$
0.4$
0.5$
0.6$
0.7$
0.8$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$
Garden'
Our$AG1SV$ OR1PA$ OR1SP$ OR1MM$
0.1$
0.3$
0.5$
0.7$
0.9$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$
Ice$
Our$AG1SV$ OR1PA$ OR1SP$ OR1MM$
The Self-augmented
Appearance Model
Supervoxel flow propagation
performance-1
0.2$
0.4$
0.6$
0.8$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$
Bus$
HD.OF$ SF.BI.OF$ SVXL.flow$
0.2$
0.4$
0.6$
0.8$
1$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$
Garden'
HD.OF$ SF.BI.OF$ SVXL.flow$
0.4$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$
Ice$
HD/OF$ SF/BI/OF$ SVXL/flow$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$
Container)
HD/OF$ SF/BI/OF$ SVXL/flow$
Supervoxel flow propagation
performance-2
0.7$
0.8$
0.9$
1$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$
Paris&
HD/OF$ SF/BI/OF$ SVXL/flow$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
1$ 11$ 21$ 31$ 41$ 51$ 61$
Soccer&
HD/OF$ SF/BI/OF$ SVXL/flow$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$
Salesman(
HD/OF$ SF/BI/OF$ SVXL/flow$
0.2$
0.4$
0.6$
0.8$
1$
1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$
Stefan'
HD.OF$ SF.BI.OF$ SVXL.flow$
Finally, putting everything
together, our Hierarchical
Supervoxel Fusion-based Pixel
Label Propagation
Subproblem 2
Random Field Priors for
Improving the Spatiotemporal
Robustness of Classifiers
Problems with Traditional First
Order Neighborhood
µ
ν
ν
ν
ν
µ µ
µ µ
Higher-order Proxy Neighbors
µ
ν
ν
ν
ν
E [{mµn : µn
2 Gn
}] = 1
X
µn2Gn
E1 (µn
, mµn )
+ 2
X
µn2Gn
(
(µn
, mµn )
X
hµn,⌫ni

E2 (µn
, ⌫n
, mµn , m⌫n )
+ 0
2
X
⌫n2Gn
(⌫n
, mµn )
X
h⌫n
,⌧n
i
hµn
,⌫n
i
E2 (⌫n
, ⌧n
, mµn , m⌧n )
)
Energy Minimization via the
Graph-Shifts Algorithm
Shift
µ µν ν
P(ν) P(ν) P(µ)P(µ)
Recursive Computation of the Energy
E1 (µn
, mµn ) =
⇢
E1 (I (S [µn
]) , mµn ) if n = 0P
µn 12C(µn) E1 µn 1
, mµn 1 otherwise
E2 (µn
, ⌫n
, mµn , m⌫n ) =
8
><
>:
E2 (mµn , m⌫n ) if n = 0P
µn 1
2C(µn
)
⌫n 1
2C(⌫n
)
hµn 1
,⌫n 1
i
E2 µn 1
, ⌫n 1
, mµn 1 , m⌫n 1 otherwise
The overall energy, specified for level 0, is computed at
any level by: E [{mµn : µn
2 Gn
}] = 1
X
µn2Gn
E1 (µn
, mµn )
+ 2
X
µn2Gn

(µn
, mµn )
X
hµn,⌫ni
E2 (µn
, ⌫n
, mµn , m⌫n )
where (µn
, mµn ) =
D0
(µn
)
P
a2D0(µn)
P
ha,bi
⇣
An(a), An(b)
⌘
The Shift-Gradient is defined as
E (mµn ! ˆmµn )
= E [{ ˆmµn : µn
2 Gn
}] E [{mµn : µn
2 Gn
}]
= 1 [E1 (µn
, ˆmµn ) E1 (µn
, mµn )]
+ 2
(
X
µn2Gn

(µn
, ˆmµn )
X
hµn,⌫ni
E2 (µn
, ⌫n
, ˆmµn , m⌫n )
X
µn2Gn

(µn
, mµn )
X
hµn,⌫ni
E2 (µn
, ⌫n
, mµn , m⌫n )
)
.
Visualizing the Graph-Shifts
Process and Hierarchy
Input Image lv. 1 lv. 2 lv. 3 lv. 4 lv. 5 lv. 6
The Hierarchy
Input Label shift #0 shift #20 shift #60shift #40
The Energy Minimization Process
Efficiency Improvements of
using HOPS
Input Ground Truth Classifier only First-order HOPS
Probability maps output by the classifier, and share by first-order and HOPS's E1 term:
void sky water road grass tree(s)
mountain animal/man building bridge
4830 shifts 3769 shifts
-22%
vehicle coastline
Efficiency Improvements of
using HOPS
Input Ground Truth Classifier only First-order HOPS
Probability maps output by the classifier, and share by first-order and HOPS's E1 term:
void sky water road grass tree(s)
2042 shifts 1868 shifts
-8.6%
mountain animal/man building bridge vehicle coastline
Qualitative Results of HOPS
on the MSRC-21 dataset
Legend
void building grass tree cow horse sheep sky mountain aeroplane water face
car bicycle flower sign bird book chair road cat dog body boat
Image
Labels
First
order
HOPS
Examples of HOPS outperforming first order neighborhood models Mislabeling by HOPS
Classifier
only
Qualitative Results of HOPS
on the LHI dataset
Examples of HOPS outperforming first order neighborhood models Mislabeling by HOPS
void sky water road grass tree(s) mountain animal/man building bridge vehicle coastlineLegend
Image
Labels
First
order
HOPS
Classifier
only
Quantitative Results on the
MSRC-21 and LHI datasets
Table 4.1: Comparison of overall accuracy rate on the LHI dataset
Classifier-Only First Order HOPS
Overall Accuracy 59.71 72.42 73.48
Improvement over classifier-only
overall accuracy
12.71 13.77
Percentage gained over first-order
neighborhood’s improvement
8.34%
Table 4.2: Comparison of overall accuracy rate on the MSRC dataset
Classifier-Only First Order HOPS
Overall Accuracy 55.87 74.73 75.04
Improvement over classifier-only
overall accuracy
18.86 19.17%
Percentage gained over first-order
neighborhood’s improvement
1.64%
Table 4.1: Comparison of overall accuracy rate on the LHI dataset
Classifier-Only First Order HOPS
Overall Accuracy 59.71 72.42 73.48
Improvement over classifier-only
overall accuracy
12.71 13.77
Percentage gained over first-order
neighborhood’s improvement
8.34%
Table 4.2: Comparison of overall accuracy rate on the MSRC dataset
Classifier-Only First Order HOPS
Overall Accuracy 55.87 74.73 75.04
Improvement over classifier-only
overall accuracy
18.86 19.17%
Percentage gained over first-order
neighborhood’s improvement
1.64%
The optimum weights for the energy models are estimated (learned) during the train-
LHI
MSRC-21
Problems with existing ways of
modeling temporal priors
Doesn't model object motion
frame t-1
frame t
requires pre-computing of optical flow
Initial temporal
link
Energy-reduced
temporal link
Shift
Overkill, computationally expensive
our video graph-shifts algorithm
frame t-1
frame t-1
frame t-1
frame t
frame t
frame t
Temporally Consistent Energy
Model
(µ, ⇢) =
(
0 if mµ 6= m⇢
exp( ↵||Xµ X⇢||p) otherwise
.
E[{mµ : µ 2 D}] = 1
X
µ2D
E1(I(S[µ]), mµ)
+ 2
X
hµ,⌫i
E2(mµ, m⌫) + 3
X
µ2D
Et(mµ, m⇢)
Et(mµ, m⇢) = 1 (µ, ⇢),
⇢ = argmin

||Xµ X||p,  2 {[⌘ : h0
, ⌘i} .
Overview of theVideo Graph-
Shifts Process
frame t-1 frame t
layer n layer n
layer n+1 layer n+1
Temporal
Correspondent
Change
Shift
µ
Experiments--The Buffalo
wintry driving dataset
Experiments--The Buffalo
wintry driving dataset
Results sky (others) obstacles road
mjrd5_00003 mjrd5_00004 mjrd5_00005
Input
frame
Ground
truth
labels
Results:
without
temporal
links
Legend
name
Results:
with our
dynamic
temporal
links
Results
on the
Camvid
dataset
Input
frame
Ground
truth
labels
void building tree sky car sign
road pedestrian fence pole sidewalk bicyclist
Legend
00001TP_008820 00001TP_008850 00001TP_008880Name
Results:
without
temporal
links
Results:
with our
dynamic
temporal
links
Results
on the
Camvid
dataset
Input
frame
Ground
truth
labels
void building tree sky car sign
road pedestrian fence pole sidewalk bicyclist
Legend
Name Seq05VD_f01200 Seq05VD_f01230 Seq05VD_f01260
Results:
without
temporal
links
Results:
with our
dynamic
temporal
links
VGS
building 72.2 20.4 0.5 2.0 0.5 0.3 4.0
tree 16.4 79.9 1.4 1.0 0.3 0.9
sky 1.1 5.7 92.0 0.4 0.8
car 0.5 0.1 68.8 29.7 0.2 0.1 0.6
sign 80.3 7.1 12.6
road 2.4 93.0 4.6
pedestrian 9.9 16.7 0.1 18.9 1.1 25.0 0.4 12.9 12.8 2.2
fence 43.4 2.2 23.0 15.6 3.8 5.2 6.2 0.6
column 12.4 37.5 18.4 7.3 1.2 0.6 20.4 1.9 0.3
sidewalk 2.6 59.7 37.5 0.1
bicyclist 7.6 6.7 23.5 20.3 14.2 11.8 6.2 9.7
2D MRF
building 71.6 20.7 0.5 1.9 0.9 0.3 4.2
tree 17.0 78.3 1.5 0.9 0.7 0.1 1.5
sky 1.2 5.6 91.9 0.4 0.9
car 0.3 0.1 62.6 35.9 0.4 0.1 0.6 0.1
sign 76.0 8.2 15.8
road 1.6 93.7 4.7
pedestrian 12.5 23.3 0.1 19.4 2.3 17.7 0.8 6.6 16.3 0.8
fence 47.0 2.1 22.6 17.8 4.3 0.1 6.1
column 13.0 36.4 19.0 6.7 0.1 1.2 0.9 20.8 1.9
sidewalk 1.6 62.0 36.3
bicyclist 9.4 4.6 24.8 20.5 27.7 1.8 6.0 5.3
Results
on the
Camvid
dataset
Subproblem 3
Adapting the Learned
Classifiers to work in new
Domains
Motivation
• Similar images often share the same
parameter configuration for many
computer vision algorithms.
• Utilize this knowledge to develop meta-
classifiers (classifiers for classifiers).
• Utilize the local smoothness priors to
speed up the parameter space exploration,
as well as aid the adaptation process.
Objective function projection
Parameter
Space
xa
xb
xc
xd
Objective
Space
f(xa)
f(xb)f(xc)
f(xd)
f(x) = [f1(x),f2(x),…,fj(x)]:
unknown, non-linear,
non-convex function
Optimal Config. Exploration
Parameter Space
x1
x2
Objective Space
f(x1)
f(x2)
Pareto Front
x3
f(x3)
f()
1. Given two points f(x1), f(x2) in the objective space, determine
whether the unknown projection function f() is locally linear by
performing our SPEA2-LLP algorithm.
Objective Space
f(x1)
f(x2)
Pareto Front
f ’
2. If Dist( f ’, f(x3) ) is large, f() is
non-linear between f(x1), f(x2).
Break into smaller intervals and
do SPEA2-LLP until converge.
f(x3)
Dist(f ’, f(v3))
Objective Space
f(x1)
f(x2)
Pareto Front
f ’
f(x3)
Dist(f  ’, f(x3))
3. If Dist( f ’, f(x3) ) is small,
sample a few more points
before concluding that f() is
linear between f(x1), f(x2).
f ’
x3 = w1x1+w2x2
f ’  = w1f(x1)+w2f(x2)i
xi
vi
f(xi)
f(xi)
xi xi
f(xi)
Earlier results-binarization
Using PIE to automatically determine the
binarization param. in a sliding window.
(PIE trained on a different randomly
selected separate from DIBCO2011)
Test Image: DIBCO 2009, P04
PIE result
One of the hand picked fixed parameter
binarization result. It cannot adapt to the
changing background intensity.
Hand picked fixed parameter result
Precision-recall of PIE (blue ◊) vs.
different fixed param. (red □)
Using a sliding window, using the previously learned
optimal parameter configuration for every location.
Earlier results-binarization
Test Image: DIBCO 2009, H04
Using PIE to automatically determine the
binarization param. in a sliding window.
(PIE trained on a different randomly
selected separate from DIBCO2011)
Precision-recall of PIE (blue ◊) vs.
different fixed param. (red □)
Binarization Result Comparison
(prior to post-processing & noise removal)
One of the hand picked fixed parameter
binarization result. It cannot adapt to the
changing background intensity.
Earlier results
Segmentation on BSDS-500
σ=1.2,k=500
min=100
Input Image
Groundtruth
(one of the many) Our resultBad Inference
PFF default param.
σ=0.8, k=300
σ=0.22,k=688
min=167
σ=0.88,k=442
min=100
σ=0.6,k=500
min=600
σ=0.88,k=442
min=100
σ=0.5,k=500
min=800
Additional Results from using
the Parameter Inference Engine
(PIE) on other problems
Segmentation on the
Weizmann Horse Dataset
Segmentation on the
Weizmann Horse Dataset
PIE as an Ensemble Combiner
PIE Equal Weights
Class Per-Class Precision.
(for 100 and 10,000
initial points)
Overall Average Ac-
curacy (for 100 and
10,000 initial points)
Per-Class
Precision
Overall
Average
Accuracy
Bass 70.97/76.67 80.56/82.41 58.82 74.07
Grand Piano 88.89/94.74 80.56/82.41 76.47 74.07
Minaret 100/100 79.63/82.41 96.43 74.07
Soccer Ball 83.33/80.77 81.48/83.33 68.97 74.07
Average 85.80/88.04 80.56/82.64 75.17 74.07
Average PIE Im-
provements (%)
14.13/17.12 8.75/11.56
Table 6.1: Results 1
• Random forest with 100 randomized trees, binary
test at each node, and learned by maximum
information gain on a dictionary of 1024 quantized
SIFT feature vectors.
4 class subset from Caltech 101, 15 training per class
PIE as an Ensemble Combiner
• aa
Average PIE Im-
provements (%)
14.13/17.12 8.75/11.56
Table 6.1: Results 1
PIE Equal Weights
Class Per-Class Precision.
(for 100 and 10,000
initial points)
Overall Average Ac-
curacy (for 100 and
10,000 initial points)
Per-Class
Precision
Overall
Average
Accuracy
Faces 71.33/71,67 60.82/60.60 70.71 58.83
airplanes 74.88/73.36 60.49/60.60 68.38 58.83
anchor 9.52/16.67 60.38/60.49 5.00 58.83
ant 34.78/50.00 60.26/60.15 28.57 58.83
barrel 35.71/63.64 60.71/60.60 18.19 58.83
bass 31.82/23.33 60.49/60.38 16.13 58.83
beaver 20.69/23.53 60.93/60.26 18.37 58.83
binocular 58.82/61.11 60.26/60.60 47.37 58.83
bonsai 69.23/64.29 60.26/60.60 50.00 58.83
brain 70.97/69.01 60.04/60.71 59.52 58.83
brontosaurus 100/100 60.04/60.60 0.00 58.83
car side 59.42/62.40 60.49/60.71 57.35 58.83
Average 53.10/56.58 60.43/60.52 36.63 58.83
Avg. PIE Im-
provements (%)
44.95/54.47 2.72/2.88
Table 6.2: Results 212 class subset from Caltech 101, 15 training per class
Conclusion
• Spatiotemporal priors for pixel label
propagation in space-time volumes: Bilayer
MRF and HSF based propagation.
• HOPS for longer range spatial modeling,
VGS for dynamic temporal modeling.
• PIE for utilizing the localness priors to
explore & adapt parameter configurations.
• Full potential of spatiotemporal priors still
frequently overlooked.
Publications
1. W.Wu,A.Y. C. Chen, L. Zhao, and J. J. Corso. Brain tumor detection and segmentation in a CRF framework with pixel-wise
affinity and superpixel-level features. International Journal of Computer Assisted Radiology and Surgery, 2015.
2. S. N. Lim,A.Y. C. Chen and X.Yang. Parameter Inference Engine (PIE) on the Pareto Front. In Proceedings of International
Conference of Machine Learning,Auto ML Workshop, 2014.
3. A.Y. C. Chen, S.Whitt, C. Xu, and J. J. Corso. Hierarchical supervoxel fusion for robust pixel label propagation in videos. In
Submission to ACM Multimedia, 2013.
4. A.Y.C. Chen and J.J. Corso.Temporally consistent multi-class video-object segmentation with the video graph-shifts
algorithm. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011.
5. D.R. Schlegel,A.Y.C. Chen, C. Xiong, J.A. Delmerico, and J.J. Corso. Airtouch: Interacting with computer systems at a
distance. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011.
6. A.Y.C. Chen and J.J. Corso. On the effects of normalization in adaptive MRF Hierarchies. In Proceedings of International
Symposium CompIMAGE, 2010.
7. A.Y.C. Chen and J.J. Corso. Propagating multi-class pixel labels throughout video frames. In Proceedings of IEEE Western
NewYork Image Processing Workshop, 2010.
8. A.Y. C. Chen and J. J. Corso. On the effects of normalization in adaptive MRF Hierarchies. Computational Modeling of
Objects Represented in Images, pages 275–286, 2010.
9. Y.Tao, L. Lu, M. Dewan,A.Y. C. Chen, J. J. Corso, J. Xuan, M. Salganicoff, and A. Krishnan. Multi-level ground glass nodule
detection and segmentation in ct lung images. Medical Image Computing and Computer-Assisted Intervention, 2009.
10. A.Y.C. Chen, J.J. Corso, and L.Wang. Hops: Efficient region labeling using higher order proxy neighborhoods. In
Proceedings of IEEE International Conference on Pattern Recognition, 2008.

Weitere ähnliche Inhalte

Ähnlich wie Improving Spatiotemporal Stability for Object Detection and Classification

Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
 
Image Classification
Image ClassificationImage Classification
Image ClassificationAnwar Jameel
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platformNaoki (Neo) SATO
 
Generalization PyData Jan van der Vegt
Generalization PyData Jan van der VegtGeneralization PyData Jan van der Vegt
Generalization PyData Jan van der VegtJan van der Vegt
 
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docxPA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docxgerardkortney
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)Hansol Kang
 
Mapping the world with Twitter
Mapping the world with TwitterMapping the world with Twitter
Mapping the world with Twittercarlo zapponi
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Setting the Stage for SVG Animation
Setting the Stage for SVG AnimationSetting the Stage for SVG Animation
Setting the Stage for SVG AnimationJames Nowland
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用NTC.im(Notch Training Center)
 
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesA Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesMohamed Farouk
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsKen Chatfield
 
jpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.pptjpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.pptnaghamallella
 
CAD Project presenation
CAD Project presenation CAD Project presenation
CAD Project presenation Pankaj Sharma
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Suvadip Shome
 
Show and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorShow and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorHojin Yang
 

Ähnlich wie Improving Spatiotemporal Stability for Object Detection and Classification (20)

Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
Image Classification
Image ClassificationImage Classification
Image Classification
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform[第34回 WBA若手の会勉強会] Microsoft AI platform
[第34回 WBA若手の会勉強会] Microsoft AI platform
 
Generalization PyData Jan van der Vegt
Generalization PyData Jan van der VegtGeneralization PyData Jan van der Vegt
Generalization PyData Jan van der Vegt
 
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docxPA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
PA 1c. Decision VariablesabcdCalculated values0.21110.531110.09760.docx
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
 
Mapping the world with Twitter
Mapping the world with TwitterMapping the world with Twitter
Mapping the world with Twitter
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Setting the Stage for SVG Animation
Setting the Stage for SVG AnimationSetting the Stage for SVG Animation
Setting the Stage for SVG Animation
 
BYO3D 2011: Construction
BYO3D 2011: ConstructionBYO3D 2011: Construction
BYO3D 2011: Construction
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
NTC_TENSORFLOW深度學習快速上手班_Part3_電腦視覺應用
 
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesA Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
 
jpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.pptjpg image processing nagham salim_as.ppt
jpg image processing nagham salim_as.ppt
 
CAD Project presenation
CAD Project presenation CAD Project presenation
CAD Project presenation
 
Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1Real-time Face Recognition & Detection Systems 1
Real-time Face Recognition & Detection Systems 1
 
Show and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generatorShow and tell: A Neural Image caption generator
Show and tell: A Neural Image caption generator
 

Mehr von Albert Y. C. Chen

Building ML models for smart retail
Building ML models for smart retailBuilding ML models for smart retail
Building ML models for smart retailAlbert Y. C. Chen
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Albert Y. C. Chen
 
為何VC不投資我的AI新創?
為何VC不投資我的AI新創?為何VC不投資我的AI新創?
為何VC不投資我的AI新創?Albert Y. C. Chen
 
數據特性 vs AI產品設計與實作
數據特性 vs AI產品設計與實作數據特性 vs AI產品設計與實作
數據特性 vs AI產品設計與實作Albert Y. C. Chen
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Prototyping and Product Development for Startups
Prototyping and Product Development for StartupsPrototyping and Product Development for Startups
Prototyping and Product Development for StartupsAlbert Y. C. Chen
 
AI創新創業的商業模式與專案風險管理
AI創新創業的商業模式與專案風險管理AI創新創業的商業模式與專案風險管理
AI創新創業的商業模式與專案風險管理Albert Y. C. Chen
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用
用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用
用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用Albert Y. C. Chen
 
Find Your Passion and Make a Difference in Your Career
Find Your Passion and Make a Difference in Your CareerFind Your Passion and Make a Difference in Your Career
Find Your Passion and Make a Difference in Your CareerAlbert Y. C. Chen
 
Big Video Data Revolution, Challenges Unresolved
Big Video Data Revolution, Challenges UnresolvedBig Video Data Revolution, Challenges Unresolved
Big Video Data Revolution, Challenges UnresolvedAlbert Y. C. Chen
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning FoundationsAlbert Y. C. Chen
 
AI智慧服務推動經驗分享
AI智慧服務推動經驗分享AI智慧服務推動經驗分享
AI智慧服務推動經驗分享Albert Y. C. Chen
 
AI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAlbert Y. C. Chen
 
媒體、影視產業、AI新創
媒體、影視產業、AI新創媒體、影視產業、AI新創
媒體、影視產業、AI新創Albert Y. C. Chen
 

Mehr von Albert Y. C. Chen (15)

Building ML models for smart retail
Building ML models for smart retailBuilding ML models for smart retail
Building ML models for smart retail
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
為何VC不投資我的AI新創?
為何VC不投資我的AI新創?為何VC不投資我的AI新創?
為何VC不投資我的AI新創?
 
數據特性 vs AI產品設計與實作
數據特性 vs AI產品設計與實作數據特性 vs AI產品設計與實作
數據特性 vs AI產品設計與實作
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Prototyping and Product Development for Startups
Prototyping and Product Development for StartupsPrototyping and Product Development for Startups
Prototyping and Product Development for Startups
 
AI創新創業的商業模式與專案風險管理
AI創新創業的商業模式與專案風險管理AI創新創業的商業模式與專案風險管理
AI創新創業的商業模式與專案風險管理
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用
用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用
用AI創造大商機:媒體、廣告、電商、零售業的視覺辨識應用
 
Find Your Passion and Make a Difference in Your Career
Find Your Passion and Make a Difference in Your CareerFind Your Passion and Make a Difference in Your Career
Find Your Passion and Make a Difference in Your Career
 
Big Video Data Revolution, Challenges Unresolved
Big Video Data Revolution, Challenges UnresolvedBig Video Data Revolution, Challenges Unresolved
Big Video Data Revolution, Challenges Unresolved
 
Machine Learning Foundations
Machine Learning FoundationsMachine Learning Foundations
Machine Learning Foundations
 
AI智慧服務推動經驗分享
AI智慧服務推動經驗分享AI智慧服務推動經驗分享
AI智慧服務推動經驗分享
 
AI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thingAI gold rush, tool vendors and the next big thing
AI gold rush, tool vendors and the next big thing
 
媒體、影視產業、AI新創
媒體、影視產業、AI新創媒體、影視產業、AI新創
媒體、影視產業、AI新創
 

Kürzlich hochgeladen

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Kürzlich hochgeladen (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Improving Spatiotemporal Stability for Object Detection and Classification

  • 1. Improving Spatiotemporal Stability for Object Detection and Classification Albert Y. C. Chen, Ph.D. Computer Scientist @ Tandent Vision 2015/03/27
  • 2. Videos, lots of them. 0 20 40 60 80 2007 2008 2009 2010 2011 2012 Hours of videos uploaded toYoutube every minute
  • 3. Goal: automatically analyze, organize, and archive videos. Typical Approaches: Classifiers, classifiers, classifiers •Video nouns, e.g., sky, tree, building, car, etc. •Video noun structures, e.g., horizontal flat surfaces, vertical surfaces, non-support surfaces, etc. •Video verbs, e.g., diving, bench press, punch.
  • 4. Results are far from perfect for example, in Joint Segmentation and Classification (multiple semantic class pixel labeling)
  • 5. Example annotations Object segmentation Class segmentation Difficult objects masked Image
  • 6. Example annotations Object segmentation Class segmentationImage
  • 7. State-of-the-art results from PascalVOC 2012 Segmentation Challenge
  • 8. Example segmentations Image Ground truth NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
  • 9. Example segmentations Image Ground truth NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
  • 10. Example segmentations Image Ground truth NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
  • 11. Example Segmentations Image Ground truth NUS_DET_SPR_GC_SPBONN_O2PCPMC_FGT_SEGM
  • 12. Apply these object classifiers to videos, frame by frame? Input frame Ground truth labels 2D MRF results 00001TP_008820 00001TP_008850 VGS results 00001TP_008880Name
  • 13. Markov Random Field (MRF) for modeling Spatiotemporal Priors spatial hidden labels observed noisy labels temporal first order spatial neighborhood higher order spatial neighborhood temporal neighborhood
  • 14. Generic MRF Formulation for classification taks E2 (mµ, m⌫) = 1 (mµ, m⌫) E [{mµ : µ 2 G}] = X µ2G E1 (I (S [µ]) , mµ) + X hµ,⌫i E2 (mµ, m⌫) E1 (I (S [µ]) , mµ) = log P ⇣ mµ I (S [µ]) ⌘
  • 15. Major technical contributions, MRF for modeling Spatiotemporal Priors Name Application Description Bilayer MRF Video Label Propagation An additional layer of hidden variables to model the motion v.s. appearance model weights. Higher Order Proxy Neighborhood Joint segmentation and classification Longer range spatial smoothness with traditional 1st order neighborhood. Video Graph- Shifts Joint segmentation and classification in videos Simultaneously estimate the motion priors while doing multiple semantic class labeling.
  • 16. Subproblem 1 Bootstrapping the Classifier Training process by using Hierarchical Supervoxels
  • 17. The inconsistent and time consuming task of pixel labeling Seq05VD_f02400Seq05VD_f02370Seq05VD_f02340 inputfram e sem antic object label roadsidewalk sign From the CambridgeVideo Driving Dataset
  • 18. Video pixel label propagation FG Traditional Spatial Propagation Pixel label map Label a subset of pixels BG Spatio-temporal Propagation time
  • 20. Bidirectional optical flow frame 20 Black & Anadan Classic+NL Bidirectional optical flow frame 60 Black & Anadan Classic+NL Maybe a different optical flow algorithm?
  • 21. Why optical flow alone fails a hole occurs the dragging effect Forward Flow Reverse Flow multiple incoming flows t t+1 t t+1
  • 22. Train a appearance model on the user annotated frame? 0 10 20 30 40 50 60 70 80 90 100 1 11 21 31 41 51 61 71 !"#$%&#'(#$)*#&+,-'!../$%.0'1"#$'23#'4#5/#-.#' X:do-nothing M:forward-flow A:patch
  • 23. Try again? Motion-only Propagation Appearance-only Propagation50.00 55.00 60.00 65.00 70.00 75.00 80.00 85.00 90.00 95.00 100.00 1 11 21 31 41 51 61 71 81 !"#$%&#'(#$)*#&+,-'!../$%.0'1"#$'23#'4#5/#-.#' X:do-nothing M:forward-flow A:patch
  • 24. Maybe we should do something like this? app. app. flow flow both both both both flow app.
  • 25. Turns out to be an optical flow reliability estimation problem
  • 26. How good is our Motion vs Appearance (MvA) weights? 40 80 o. flow only The Container Sequence input image GT label app. onlyour method 40 80 input image GT label our method o. flow only app. only The Garden Sequence
  • 27. Well, there’s still problems-1 0.4 0.5 0.6 0.7 0.8 0.9 1 1 11 21 31 41 51 61 71 How to Weigh between Mot and App?   Fixed weight for all pixel Naïve cross-correlation Occlusion-aware cross corr. Bidirectional flow consistency
  • 28. Well, there’s still problems-2 Initial Noisy WvA weight map Optimized WvA map with our bilyaer MRF bussoccer Target frame for propagation Ground Truth Label
  • 29. Our bilayer MRF for Label Propagation Observed noisy values (Hidden true pixel labels) (Hidden true WvA weights) 1st layer of MRF 2nd layer of MRF label change at causes to change as well as causing the WvA layer's energy to change Our proposed Bilayer MRF for Video Pixel Label Pixel Label Propagatoin
  • 30. Results frame 1 frame 75frame 50frame 25 stefan 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 11 21 31 41 51 61 71 Stefan (tenis) Sequence  Appearance uni-model Appearance multi-model Do nothing Bidirectional flow Bad (fixed) WvA weights Our method
  • 31. Results soccer frame 1 frame 75frame 50frame 25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 11 21 31 41 51 61 Soccer Sequence  Appearance uni-model Appearance multi-model Do nothing Bidirectional flow Our method Bad WvA weights
  • 32. Results bus frame 1 frame 75frame 50frame 25 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 11 21 31 41 51 61 71 Bus Sequence  Do nothing Bidirectional flow Appearance uni-model Bad (fixed) MvA weight Our method
  • 33. Good, but far from perfect • Overall accuracy still low • Object Boundaries crossed • Optical flow reliability estimation still noisy
  • 34. Hierarchical Supervoxel Fusion (HSF) for Pixel Labeling input video: supervoxel hierarchy self-augmented appearance model: supervoxel flow: classifier HSF-based Pixel-label Propagation
  • 35. What does HSF buy us? • 100x more data for the appearance model. • Supervoxel-level correspondences instead of just pixel-level optical flow. • State-of-the-art pixel label propagation performance.
  • 36. Supervoxel Hierarchy and the “right scale”
  • 37.
  • 38. The HSF Process y Hierarchical Supervoxel Fusion x t Label Consistency Maps Supervoxel Hierarchy y x t vehicle flower tree y x t input video:
  • 39. Automatic Selection of the Maximum Hierarchy Height Soccer grass tree face sign dog body 66x 83x 14x 28x 15x 62x Stefan grass face sign chair body 83x 1x 75x 1x 83x Camvid bldg tree/grasssky road pavemt. concr. roadmk. 6x 25x 1170x 176x 76x 20x 1756x Table 3.1: Increase in training set size of the self-augmented training set (done through Hierarchical Supervoxel Fusion) over the original training set. Seq Lv 6 7 8 9 10 11 12 13 14 15 Bus 4.22% 6.11% 8.93% 9.44% 10.71% 18.57% 22.00% 27.55% 35.96% 47.36% Container 0.08% 0.07% 0.16% 0.44% 0.86% 2.37% 3.28% 6.69% 14.11% 21.75% Garden 0.83% 1.74% 2.66% 3.90% 6.21% 11.37% 20.12% 29.74% 30.43% 50.68% Ice 0.11% 0.28% 0.89% 1.54% 1.99% 2.21% 2.32% 2.32% 2.41% 27.04% Paris 0.38% 0.46% 0.73% 1.30% 2.02% 3.68% 9.02% 9.48% 11.32% 13.93% Salesman 0.31% 0.46% 0.66% 1.58% 4.00% 7.18% 10.23% 20.99% 24.17% 25.01% Soccer 0.29% 0.49% 0.61% 1.31% 1.57% 1.70% 5.43% 19.12% 33.89% 38.57% Stefan 0.42% 0.74% 1.10% 1.38% 1.69% 1.91% 2.45% 3.97% 6.73% 39.70% Camvid 1.72% 3.55% 6.23% 7.51% 11.06% 18.45% 25.84% Table 3.2: Automatic Hierarchy Height Selection by computing the Supervoxel Boundary Error on the user annotated frame. The shaded levels are discarded since too many of the supervoxels violate the user-defined boundaries. Supervoxel boundary error on the user annotated frame.
  • 40. The Self-augmented Appearance Model Bus tree horse car flower sign road 24x 3x 48x 33x 8x 18x Container bldg grass tree sky water road boat 91x 109x 93x 100x 90x 116x 89x Garden bldg tree sky flower 96x 54x 31x 60x Ice face sign road body 37x 22x 89x 65x Paris tree face book body 113x 127x 105x 44x Salesman tree face book 111x 102x 84x Soccer grass tree face sign dog body 66x 83x 14x 28x 15x 62x Stefan grass face sign chair body 83x 1x 75x 1x 83x Camvid bldg tree/grasssky road pavemt. concr. roadmk. 6x 25x 1170x 176x 76x 20x 1756x Table 3.1: Increase in training set size of the self-augmented training set (done through Hierarchical Supervoxel Fusion) over the original training set. Seq Lv 6 7 8 9 10 11 12 13 14 15 Bus 4.22% 6.11% 8.93% 9.44% 10.71% 18.57% 22.00% 27.55% 35.96% 47.36% Container 0.08% 0.07% 0.16% 0.44% 0.86% 2.37% 3.28% 6.69% 14.11% 21.75% Increase in the number of pixels available for training the appearance model.
  • 41. 0.2$ 0.3$ 0.4$ 0.5$ 0.6$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ Bus$ Our$AG0SV$ OR0PA$ OR0SP$ OR0MM$ 0.6$ 0.65$ 0.7$ 0.75$ 0.8$ 0.85$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ Container) Our$AG1SV$ OR1PA$ OR1SP$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ Garden' Our$AG1SV$ OR1PA$ OR1SP$ OR1MM$ 0.1$ 0.3$ 0.5$ 0.7$ 0.9$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ Ice$ Our$AG1SV$ OR1PA$ OR1SP$ OR1MM$ The Self-augmented Appearance Model
  • 42. Supervoxel flow propagation performance-1 0.2$ 0.4$ 0.6$ 0.8$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$ Bus$ HD.OF$ SF.BI.OF$ SVXL.flow$ 0.2$ 0.4$ 0.6$ 0.8$ 1$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$ Garden' HD.OF$ SF.BI.OF$ SVXL.flow$ 0.4$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ Ice$ HD/OF$ SF/BI/OF$ SVXL/flow$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$ Container) HD/OF$ SF/BI/OF$ SVXL/flow$
  • 43. Supervoxel flow propagation performance-2 0.7$ 0.8$ 0.9$ 1$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$ Paris& HD/OF$ SF/BI/OF$ SVXL/flow$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ Soccer& HD/OF$ SF/BI/OF$ SVXL/flow$ 0.5$ 0.6$ 0.7$ 0.8$ 0.9$ 1$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ 81$ Salesman( HD/OF$ SF/BI/OF$ SVXL/flow$ 0.2$ 0.4$ 0.6$ 0.8$ 1$ 1$ 11$ 21$ 31$ 41$ 51$ 61$ 71$ Stefan' HD.OF$ SF.BI.OF$ SVXL.flow$
  • 44. Finally, putting everything together, our Hierarchical Supervoxel Fusion-based Pixel Label Propagation
  • 45.
  • 46. Subproblem 2 Random Field Priors for Improving the Spatiotemporal Robustness of Classifiers
  • 47. Problems with Traditional First Order Neighborhood µ ν ν ν ν µ µ µ µ
  • 48. Higher-order Proxy Neighbors µ ν ν ν ν E [{mµn : µn 2 Gn }] = 1 X µn2Gn E1 (µn , mµn ) + 2 X µn2Gn ( (µn , mµn ) X hµn,⌫ni  E2 (µn , ⌫n , mµn , m⌫n ) + 0 2 X ⌫n2Gn (⌫n , mµn ) X h⌫n ,⌧n i hµn ,⌫n i E2 (⌫n , ⌧n , mµn , m⌧n ) )
  • 49. Energy Minimization via the Graph-Shifts Algorithm Shift µ µν ν P(ν) P(ν) P(µ)P(µ)
  • 50. Recursive Computation of the Energy E1 (µn , mµn ) = ⇢ E1 (I (S [µn ]) , mµn ) if n = 0P µn 12C(µn) E1 µn 1 , mµn 1 otherwise E2 (µn , ⌫n , mµn , m⌫n ) = 8 >< >: E2 (mµn , m⌫n ) if n = 0P µn 1 2C(µn ) ⌫n 1 2C(⌫n ) hµn 1 ,⌫n 1 i E2 µn 1 , ⌫n 1 , mµn 1 , m⌫n 1 otherwise The overall energy, specified for level 0, is computed at any level by: E [{mµn : µn 2 Gn }] = 1 X µn2Gn E1 (µn , mµn ) + 2 X µn2Gn  (µn , mµn ) X hµn,⌫ni E2 (µn , ⌫n , mµn , m⌫n ) where (µn , mµn ) = D0 (µn ) P a2D0(µn) P ha,bi ⇣ An(a), An(b) ⌘
  • 51. The Shift-Gradient is defined as E (mµn ! ˆmµn ) = E [{ ˆmµn : µn 2 Gn }] E [{mµn : µn 2 Gn }] = 1 [E1 (µn , ˆmµn ) E1 (µn , mµn )] + 2 ( X µn2Gn  (µn , ˆmµn ) X hµn,⌫ni E2 (µn , ⌫n , ˆmµn , m⌫n ) X µn2Gn  (µn , mµn ) X hµn,⌫ni E2 (µn , ⌫n , mµn , m⌫n ) ) .
  • 52. Visualizing the Graph-Shifts Process and Hierarchy Input Image lv. 1 lv. 2 lv. 3 lv. 4 lv. 5 lv. 6 The Hierarchy Input Label shift #0 shift #20 shift #60shift #40 The Energy Minimization Process
  • 53. Efficiency Improvements of using HOPS Input Ground Truth Classifier only First-order HOPS Probability maps output by the classifier, and share by first-order and HOPS's E1 term: void sky water road grass tree(s) mountain animal/man building bridge 4830 shifts 3769 shifts -22% vehicle coastline
  • 54. Efficiency Improvements of using HOPS Input Ground Truth Classifier only First-order HOPS Probability maps output by the classifier, and share by first-order and HOPS's E1 term: void sky water road grass tree(s) 2042 shifts 1868 shifts -8.6% mountain animal/man building bridge vehicle coastline
  • 55. Qualitative Results of HOPS on the MSRC-21 dataset Legend void building grass tree cow horse sheep sky mountain aeroplane water face car bicycle flower sign bird book chair road cat dog body boat Image Labels First order HOPS Examples of HOPS outperforming first order neighborhood models Mislabeling by HOPS Classifier only
  • 56. Qualitative Results of HOPS on the LHI dataset Examples of HOPS outperforming first order neighborhood models Mislabeling by HOPS void sky water road grass tree(s) mountain animal/man building bridge vehicle coastlineLegend Image Labels First order HOPS Classifier only
  • 57. Quantitative Results on the MSRC-21 and LHI datasets Table 4.1: Comparison of overall accuracy rate on the LHI dataset Classifier-Only First Order HOPS Overall Accuracy 59.71 72.42 73.48 Improvement over classifier-only overall accuracy 12.71 13.77 Percentage gained over first-order neighborhood’s improvement 8.34% Table 4.2: Comparison of overall accuracy rate on the MSRC dataset Classifier-Only First Order HOPS Overall Accuracy 55.87 74.73 75.04 Improvement over classifier-only overall accuracy 18.86 19.17% Percentage gained over first-order neighborhood’s improvement 1.64% Table 4.1: Comparison of overall accuracy rate on the LHI dataset Classifier-Only First Order HOPS Overall Accuracy 59.71 72.42 73.48 Improvement over classifier-only overall accuracy 12.71 13.77 Percentage gained over first-order neighborhood’s improvement 8.34% Table 4.2: Comparison of overall accuracy rate on the MSRC dataset Classifier-Only First Order HOPS Overall Accuracy 55.87 74.73 75.04 Improvement over classifier-only overall accuracy 18.86 19.17% Percentage gained over first-order neighborhood’s improvement 1.64% The optimum weights for the energy models are estimated (learned) during the train- LHI MSRC-21
  • 58. Problems with existing ways of modeling temporal priors Doesn't model object motion frame t-1 frame t requires pre-computing of optical flow Initial temporal link Energy-reduced temporal link Shift Overkill, computationally expensive our video graph-shifts algorithm frame t-1 frame t-1 frame t-1 frame t frame t frame t
  • 59. Temporally Consistent Energy Model (µ, ⇢) = ( 0 if mµ 6= m⇢ exp( ↵||Xµ X⇢||p) otherwise . E[{mµ : µ 2 D}] = 1 X µ2D E1(I(S[µ]), mµ) + 2 X hµ,⌫i E2(mµ, m⌫) + 3 X µ2D Et(mµ, m⇢) Et(mµ, m⇢) = 1 (µ, ⇢), ⇢ = argmin  ||Xµ X||p,  2 {[⌘ : h0 , ⌘i} .
  • 60. Overview of theVideo Graph- Shifts Process frame t-1 frame t layer n layer n layer n+1 layer n+1 Temporal Correspondent Change Shift µ
  • 63. Results sky (others) obstacles road mjrd5_00003 mjrd5_00004 mjrd5_00005 Input frame Ground truth labels Results: without temporal links Legend name Results: with our dynamic temporal links
  • 64. Results on the Camvid dataset Input frame Ground truth labels void building tree sky car sign road pedestrian fence pole sidewalk bicyclist Legend 00001TP_008820 00001TP_008850 00001TP_008880Name Results: without temporal links Results: with our dynamic temporal links
  • 65. Results on the Camvid dataset Input frame Ground truth labels void building tree sky car sign road pedestrian fence pole sidewalk bicyclist Legend Name Seq05VD_f01200 Seq05VD_f01230 Seq05VD_f01260 Results: without temporal links Results: with our dynamic temporal links
  • 66. VGS building 72.2 20.4 0.5 2.0 0.5 0.3 4.0 tree 16.4 79.9 1.4 1.0 0.3 0.9 sky 1.1 5.7 92.0 0.4 0.8 car 0.5 0.1 68.8 29.7 0.2 0.1 0.6 sign 80.3 7.1 12.6 road 2.4 93.0 4.6 pedestrian 9.9 16.7 0.1 18.9 1.1 25.0 0.4 12.9 12.8 2.2 fence 43.4 2.2 23.0 15.6 3.8 5.2 6.2 0.6 column 12.4 37.5 18.4 7.3 1.2 0.6 20.4 1.9 0.3 sidewalk 2.6 59.7 37.5 0.1 bicyclist 7.6 6.7 23.5 20.3 14.2 11.8 6.2 9.7 2D MRF building 71.6 20.7 0.5 1.9 0.9 0.3 4.2 tree 17.0 78.3 1.5 0.9 0.7 0.1 1.5 sky 1.2 5.6 91.9 0.4 0.9 car 0.3 0.1 62.6 35.9 0.4 0.1 0.6 0.1 sign 76.0 8.2 15.8 road 1.6 93.7 4.7 pedestrian 12.5 23.3 0.1 19.4 2.3 17.7 0.8 6.6 16.3 0.8 fence 47.0 2.1 22.6 17.8 4.3 0.1 6.1 column 13.0 36.4 19.0 6.7 0.1 1.2 0.9 20.8 1.9 sidewalk 1.6 62.0 36.3 bicyclist 9.4 4.6 24.8 20.5 27.7 1.8 6.0 5.3 Results on the Camvid dataset
  • 67. Subproblem 3 Adapting the Learned Classifiers to work in new Domains
  • 68. Motivation • Similar images often share the same parameter configuration for many computer vision algorithms. • Utilize this knowledge to develop meta- classifiers (classifiers for classifiers). • Utilize the local smoothness priors to speed up the parameter space exploration, as well as aid the adaptation process.
  • 69. Objective function projection Parameter Space xa xb xc xd Objective Space f(xa) f(xb)f(xc) f(xd) f(x) = [f1(x),f2(x),…,fj(x)]: unknown, non-linear, non-convex function
  • 70. Optimal Config. Exploration Parameter Space x1 x2 Objective Space f(x1) f(x2) Pareto Front x3 f(x3) f() 1. Given two points f(x1), f(x2) in the objective space, determine whether the unknown projection function f() is locally linear by performing our SPEA2-LLP algorithm. Objective Space f(x1) f(x2) Pareto Front f ’ 2. If Dist( f ’, f(x3) ) is large, f() is non-linear between f(x1), f(x2). Break into smaller intervals and do SPEA2-LLP until converge. f(x3) Dist(f ’, f(v3)) Objective Space f(x1) f(x2) Pareto Front f ’ f(x3) Dist(f  ’, f(x3)) 3. If Dist( f ’, f(x3) ) is small, sample a few more points before concluding that f() is linear between f(x1), f(x2). f ’ x3 = w1x1+w2x2 f ’  = w1f(x1)+w2f(x2)i xi vi f(xi) f(xi) xi xi f(xi)
  • 71. Earlier results-binarization Using PIE to automatically determine the binarization param. in a sliding window. (PIE trained on a different randomly selected separate from DIBCO2011) Test Image: DIBCO 2009, P04 PIE result One of the hand picked fixed parameter binarization result. It cannot adapt to the changing background intensity. Hand picked fixed parameter result Precision-recall of PIE (blue ◊) vs. different fixed param. (red □) Using a sliding window, using the previously learned optimal parameter configuration for every location.
  • 72. Earlier results-binarization Test Image: DIBCO 2009, H04 Using PIE to automatically determine the binarization param. in a sliding window. (PIE trained on a different randomly selected separate from DIBCO2011) Precision-recall of PIE (blue ◊) vs. different fixed param. (red □) Binarization Result Comparison (prior to post-processing & noise removal) One of the hand picked fixed parameter binarization result. It cannot adapt to the changing background intensity.
  • 73. Earlier results Segmentation on BSDS-500 σ=1.2,k=500 min=100 Input Image Groundtruth (one of the many) Our resultBad Inference PFF default param. σ=0.8, k=300 σ=0.22,k=688 min=167 σ=0.88,k=442 min=100 σ=0.6,k=500 min=600 σ=0.88,k=442 min=100 σ=0.5,k=500 min=800
  • 74. Additional Results from using the Parameter Inference Engine (PIE) on other problems
  • 77. PIE as an Ensemble Combiner PIE Equal Weights Class Per-Class Precision. (for 100 and 10,000 initial points) Overall Average Ac- curacy (for 100 and 10,000 initial points) Per-Class Precision Overall Average Accuracy Bass 70.97/76.67 80.56/82.41 58.82 74.07 Grand Piano 88.89/94.74 80.56/82.41 76.47 74.07 Minaret 100/100 79.63/82.41 96.43 74.07 Soccer Ball 83.33/80.77 81.48/83.33 68.97 74.07 Average 85.80/88.04 80.56/82.64 75.17 74.07 Average PIE Im- provements (%) 14.13/17.12 8.75/11.56 Table 6.1: Results 1 • Random forest with 100 randomized trees, binary test at each node, and learned by maximum information gain on a dictionary of 1024 quantized SIFT feature vectors. 4 class subset from Caltech 101, 15 training per class
  • 78. PIE as an Ensemble Combiner • aa Average PIE Im- provements (%) 14.13/17.12 8.75/11.56 Table 6.1: Results 1 PIE Equal Weights Class Per-Class Precision. (for 100 and 10,000 initial points) Overall Average Ac- curacy (for 100 and 10,000 initial points) Per-Class Precision Overall Average Accuracy Faces 71.33/71,67 60.82/60.60 70.71 58.83 airplanes 74.88/73.36 60.49/60.60 68.38 58.83 anchor 9.52/16.67 60.38/60.49 5.00 58.83 ant 34.78/50.00 60.26/60.15 28.57 58.83 barrel 35.71/63.64 60.71/60.60 18.19 58.83 bass 31.82/23.33 60.49/60.38 16.13 58.83 beaver 20.69/23.53 60.93/60.26 18.37 58.83 binocular 58.82/61.11 60.26/60.60 47.37 58.83 bonsai 69.23/64.29 60.26/60.60 50.00 58.83 brain 70.97/69.01 60.04/60.71 59.52 58.83 brontosaurus 100/100 60.04/60.60 0.00 58.83 car side 59.42/62.40 60.49/60.71 57.35 58.83 Average 53.10/56.58 60.43/60.52 36.63 58.83 Avg. PIE Im- provements (%) 44.95/54.47 2.72/2.88 Table 6.2: Results 212 class subset from Caltech 101, 15 training per class
  • 79. Conclusion • Spatiotemporal priors for pixel label propagation in space-time volumes: Bilayer MRF and HSF based propagation. • HOPS for longer range spatial modeling, VGS for dynamic temporal modeling. • PIE for utilizing the localness priors to explore & adapt parameter configurations. • Full potential of spatiotemporal priors still frequently overlooked.
  • 80. Publications 1. W.Wu,A.Y. C. Chen, L. Zhao, and J. J. Corso. Brain tumor detection and segmentation in a CRF framework with pixel-wise affinity and superpixel-level features. International Journal of Computer Assisted Radiology and Surgery, 2015. 2. S. N. Lim,A.Y. C. Chen and X.Yang. Parameter Inference Engine (PIE) on the Pareto Front. In Proceedings of International Conference of Machine Learning,Auto ML Workshop, 2014. 3. A.Y. C. Chen, S.Whitt, C. Xu, and J. J. Corso. Hierarchical supervoxel fusion for robust pixel label propagation in videos. In Submission to ACM Multimedia, 2013. 4. A.Y.C. Chen and J.J. Corso.Temporally consistent multi-class video-object segmentation with the video graph-shifts algorithm. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011. 5. D.R. Schlegel,A.Y.C. Chen, C. Xiong, J.A. Delmerico, and J.J. Corso. Airtouch: Interacting with computer systems at a distance. In Proceedings of IEEE Workshop on Applications of ComputerVision, 2011. 6. A.Y.C. Chen and J.J. Corso. On the effects of normalization in adaptive MRF Hierarchies. In Proceedings of International Symposium CompIMAGE, 2010. 7. A.Y.C. Chen and J.J. Corso. Propagating multi-class pixel labels throughout video frames. In Proceedings of IEEE Western NewYork Image Processing Workshop, 2010. 8. A.Y. C. Chen and J. J. Corso. On the effects of normalization in adaptive MRF Hierarchies. Computational Modeling of Objects Represented in Images, pages 275–286, 2010. 9. Y.Tao, L. Lu, M. Dewan,A.Y. C. Chen, J. J. Corso, J. Xuan, M. Salganicoff, and A. Krishnan. Multi-level ground glass nodule detection and segmentation in ct lung images. Medical Image Computing and Computer-Assisted Intervention, 2009. 10. A.Y.C. Chen, J.J. Corso, and L.Wang. Hops: Efficient region labeling using higher order proxy neighborhoods. In Proceedings of IEEE International Conference on Pattern Recognition, 2008.