CVPR2010: higher order models in computer vision: Part 1, 2

Tractable Higher Order Models in
Computer Vision
Carsten Rother
Sebastian Nowozin

Microsoft Research Cambridge

Schedule
830-900 Introduction
900-1000 Models: small cliques
and special potentials
1000-1030 Tea break
1030-1200 Inference: Relaxation techniques:
LP, Lagrangian, Dual Decomposition
1200-1230 Models: global potentials and
global parameters + discussion

A gentle intro to MRFs

Goal

z = (R,G,B)n x = {0,1}n

Given z and unknown (latent) variables x :

P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)
Posterior Likelihood Prior
Probability (data- (data-
dependent) independent)

Maximium a Posteriori (MAP): x* = argmax P(x|z)
x

Likelihood P(x|z) ~ P(z|x) P(x)

Green
Green

Red Red

Prior P(x|z) ~ P(z|x) P(x)

xi xj

P(x) = 1/f ∏ θij (xi,xj)
i,j Є N

f = ∑ ∏ θij (xi,xj) “partition function”
x i,j Є N

θij (xi,xj) = exp{-|xi-xj|} “ising prior”
(exp{-1}=0.36; exp{0}=1)

Prior
Pure Prior model: P(x) = 1/f ∏ exp{-|xi-xj|}
i,j Є N

Solutions with
Faire Samples
highest probability (mode)

P(x) = 0.011 P(x) = 0.012 P(x) = 0.012

Smoothness prior needs the likelihood

Weight prior and likelihood

w =0 w =10

w =40 w =200

E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)

Posterior distribution
P(x|z) ~ P(z|x) P(x)

“Gibbs” distribution:
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj) Energy
i i,j
Unary terms Pairwise terms

θi (xi,zi) = P(zi|xi=1) xi + P(zi|xi=0) (1-xi) Likelihood
θij (xi,xj) = |xi-xj| prior

Energy minization
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
f(z,w) = ∑ exp{-E(x,z,w)}
X

-log P(x|z) = -log (1/f(z,w)) + E(x,z,w)

x* = argmin E(x,z,w) MAP same as minimum Energy
X

MAP; Global min E ML

Random Field Models for Computer Vision
Inference:
Model :
 Variables: discrete or continuous?  Combinatorial optimization: e.g. Graph Cut
 If discrete: how many labels?  Message Passing: e.g. BP, TRW
 Space: discrete or continuous?  Iterated Conditional Modes (ICM)
 Dependences between variables?  LP-relaxation: e.g. Cutting-plane
 How many variables?
 Problem decomposition + subgradient
 …
 …

Applications:
 2D/3D Image segmentation Learning:
 Object Recognition  Exhaustive search (grid search)
 3D reconstruction
 Pseudo-Likelihood approximation
 Stereo / optical flow
 Image denoising  Training in Pieces
 Texture Synthesis  Max-margin
 Pose estimation  …
 Panoramic Stitching
 …

Introducing Factor Graphs
Write probability distributions as Graphical model:

- Direct graphical model
- Undirected graphical model “traditionally used for MRFs”
- Factor graphs “best way to visualize the underlying energy”

References:
- Pattern Recognition and Machine Learning *Bishop ‘08, book, chapter 8+
- several lectures at the Machine Learning Summer School 2009
(see video lectures)

Factor Graphs
P(x) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5) “4 factors”

P(x) ~ exp{-E(x)} Gibbs distribution
E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)

unobserved
x1 x2
variables are in same factor.

x3 x4

x5

Factor graph

Definition “Order” x x2
1

Definition “Order”:
The arity (number of variables) of
the largest factor x3 x4

Factor graph
P(X) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)
with order 3
x5
arity 3 arity 2 Triple
clique

x1 x2
Extras:
• I will use “factor” and “clique” in the same way
• Not fully correct since clique may or may not
decompose x3 x4
• Definition of “order” same for clique and factor
• Markov Property: Random Field with low-order
Undirected
factors/cliques.
model
x5

Random Fields in Vision

4-connected; higher(8)-connected; MRF with Higher-order MRF
pairwise MRF pairwise MRF global variables

E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj)
i,j Є N4 i,j Є N8 i,j Є N8 i,j Є N4
+θ(x1,…,xn)
Order 2 Order 2 Order 2
Order n

4-connected: Segmentation
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj)
i i,j Є N4

Observed variable

Unobserved variable
zi

xj xi

Factor graph

4-connected: Segmentation (CRF)
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj)
i i,j Є N4

Observed
variable
zjj Unobserved
(latent) variable
zii
z

xjj

xji
Factor graph MRF

Conditional Random Field (CRF) no pure prior

4-connected: Stereo matching

Image – left(a) Image – right(b) Ground truth depth [Boykov et al. ‘01+

• Images rectified
• Ignore occlusion for now

Energy: di

E(d): {0,…,D-1}n → R
Labels: d (depth/shift)

Random Fields in Vision

4-connected; higher(8)-connected; MRF with Higher-order MRF
pairwise MRF pairwise MRF global variables

E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x,) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj)
i,j Є N4 i,j Є N8 i,j Є N8 i,j Є N4
+θ(x1,…,xn)
Order 2 Order 2 Order 2
Order n

Highly-connect: Discretization artefacts

4-connected 8-connected
Euclidean Euclidean

higher-connectivity can model
true Euclidean length
*Boykov et al. ‘03; ‘05+

3D reconstruction

[Slide credits: Daniel Cremers]

Stereo with occlusion

E(d): {1,…,D}2n → R
Each pixel is connected to D pixels in the other image

Ground truth Stereo with occlusion Stereo without occlusion
*Kolmogrov et al. ‘02+ *Boykov et al. ‘01+

Texture De-noising

Training images Test image Test image (60% Noise)

Result MRF Result MRF Result MRF
4-connected 4-connected 9-connected
(neighbours) (7 attractive; 2 repulsive)

Reason 4: Use Non-local parameters:
Interactive Segmentation (GrabCut)

*Boykov and Jolly ’01+

GrabCut *Rother et al. ’04+

MRF with Global parameters:
Interactive Segmentation (GrabCut)

w
Model jointly segmentation and color model:

E(x,w): {0,1}n x {GMMs}→ R
E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj)
i i,j Є N4

An object is a compact set of colors:
Red
Red

*Rother et al Siggraph ’04+

Latent/Hidden CRFs
“instance”
• ObjCut Kumar et. al. ‘05, Deformable
Part Model Felzenszwalb et al., CVPR “instance
’08; PoseCut Bray et al. ’06, LayoutCRF label”
Winn et al. ’06

• Hidden variable are never observed “parts”
(either training or test), e.g. parts

• Maximizing over hidden variables

• ML: Deep Belief Networks, Restricted
Booltzman machine [Hinton et al.]
(often sampling is done)
[LayoutCRF Winn et al. ’06+

First Example

User input Standard MRF: with connectivity:
Ising prior: Ising prior:
* Smoothing boundary * Smoothing boundary
* Removes noise

E(x) = P(x) + h(x) with h(x)= { ∞ if not 4-connected
0 otherwise
This Tutorial:
1. What higher-order models have been used in vision?
2. How is MAP inference done for those models?
3. Relationship between higher-order MRFs and MRFs with global variables?

Inference
(very brief summary)
• Message Passing Techniques (BP, TRW, TRW-S)
– Defined on the factor graph ([Potetz ’07, Lan ‘06+)
– Can be applied to any model (in theory)
(higher order, multi-label)

• LP-relaxation: (… more in part III)
– Relax original problem ({0,1} to [0,1])
and solve with existing techniques (e.g. sub-gradient)
– Can be applied any model (dep. on solver used)
– Connections to TRW (message passing)

Inference
• Dual/Problem Decomposition (… more in part III)
– Decompose (NP-)hard problem into tractable once.
Solve with sub-gradient
– Can be applied any model (dep. on solver used)

• Combinatorial Optimization: (… more in part II)
– Binary, Pairwise MRF: Graph cut, BHS (QPBO)
– Multiple label, pairwise: move-making; transformation
– Binary, higher-order factors: transformation
– Multi-label, higher-order factors:
move-making + transformation

Inference higher-oder models

• Arbitrary potentials are only tractable for order <7
(memory, computation time)

• For ≥7 potentials need some structure to be
exploited in order to make them tractable

Forthcoming book!
Advances in Markov Random Fields for Computer Vision
(Blake, Kohli, Rother)
• MIT Press (probably end of 2010)

• Most topics of this course and much, much more

• Contributors: usual suspects: Editors + Boykov, Kolmogorov,

Weiss, Freeman, Komodiakis, ....

Schedule
830-900 Introduction
900-1000 Models: small cliques
and special potentials
1000-1030 Tea break
1030-1200 Inference: Relaxation techniques:
LP, Lagrangian, Dual Decomposition
1200-1230 Models: global potentials
and discussion

Small cliques (<7)
(and transformation approach)

Optimization: Binary, Pairwise
E(x) = ∑ θi (xi) + ∑ θij (xi,xj) xj ϵ {0,1}
i i,j Є N

Submodular:
• θij (0,0) + θij (1,1) ≤ θij (0,1) + θij (1,0)
• Condition holds naturally for many vision problems
(e.g. segmentation: |xi-xj|)
• Graph cut computes efficiently the global optimum
(~0.5sec for 1MPixel *Boykov, Kolmogorov ‘01+ )

Non-Submolduar:
• BHS algorithm (also called QPBO algorithm) ([Borors, Hammer, and Sun ’91,
Kolmogrov et al. ‘07+)
Graph cut on a special graph: Output ,0,1,’?’-;
• Partial optimality (various solutions for ‘?’ nodes)
• Solves underlying LP-relaxation
• Quality depends on application (see *Rother et al CVPR ‘07+)
• Extensions exists QPBOP, QPBOI (see [Rother et al CVPR ’07, Woodford et al. ‘08+)

Optimization: Binary, Pairwise
f(x1,x2) = θ11x1x2 + θ10x1(1-x2) + θ01(1-x1)x2 + θ00(1-x1)(1-x2)
f(x1,x2) = ax1x2 + bx1 + cx2 + d

Quadratic Pseudo-Boolean optimization (QPBO): B2 → R

Reminder : Encoding for graph-cut

a cut gives s
a labelling
(energy) Θ11 all weights are positive if
submodular
(re-parameterization to
Θ01 - Θ00 normal form)
x1 x2
Θ10 – Θ11
Θ00
t

Optimization: binary, higher-order

f(x1,x2,x3) = θ111x1x2x3 + θ110x1x2(1-x3) + θ101x1(1-x2)x3 + …

f(x1,x2,x3) = ax1x2x3 + bx1x2 + cx2x3… + 1

Quadratic polynomial can be done

Idea: transform 2+ order terms into 2nd order terms
Methods:
1. transformation by “substitution”
2. transformation by “min. selection”

Transformation by “substitution”
*Rosenberg ’75, Boros and Hammer ’02, Ali et al. ECCV ‘08+

f(x1,x2,x3) = x1x2x3 + x1x2 + x2

Auxiliary function:
D(x1,x2,z) = x1x2 – 2x1z – 2x2z + 3z z ϵ {0,1}
It is:
D(x1,x2,z) = 0 if x1x2 = z
D(x1,x2,z) > 0 if x1x2 ≠ z

“Substitution”:
f(x1,x2,x3) = min g(x1,x2,x3,z) = zx3 + z + x2 + K D(x1,x2,z)
z
Since K very large then x1x2 = z

Apply it recursively ….

Problem:
• Doesn’t work in practice *Ishikawa CVPR ‘09+
• function D is non-submodular and “K enforces this strongly”

Transformation by “min. selection”
[Freedman and Drineas ’05, Kolmogorov and Zabhi ’04, Ishikawa ’09+

f(x1,x2,x3) = ax1x2x3

Useful :
-x1x2x3 = min –z(x1+x2+x3-2) z ϵ {0,1}
z
Check:
- all x1,x2,x3 = 1 then z=1
- Otherwise z=0

Transform:

Case a<0: f(x1,x2,x3) = min –az (x1+x2+x3-2)
z
Case a>0: f(x1,x2,x3) = min a{z(x1+x2+x3-1)+(x1x2+x2x3+x3x1)-(x1+x2+x3+1)}
z
(similar trick)

Transformation by “min. selection”
The general case:

with nd = floor(d-1/2) many new variables w

From *Ishikawa PAMI ’09+

Full Procedure
*Ishikawa ‘09+

General 5-order potential:
f(x1,x2,x3) = ax1x2x3x4x5 + bx1x2x3x4 + c x1x2x3x5 + d x1x2x4x5 + …

… transform all 2+ degree terms are only degree 2 terms

• Worst case exponential: potential order 5 gives up to 15 new variables.
• Probably tractable for up to order 6
• May get very hard to solve (non-submodular)
• Code available online on Ishikawa’s webpage

Application 1:
De-noising with Field-of-Experts
[Roth and Black ’05, Ishikawa ‘09+

z
E(X) = ∑ (zi-xi)2 / 2σ2 + ∑ ∑ αk (1+ 0.5(Jk xc)2)
i c k
Unary FoE prior
liklihood

x
xc set of nxn patches (here 2x2)
Jk set of filters:

non-convex optimization problem

How to handle continuous labels in discrete MRF?

From *Ishikawa PAMI ’09,
Roth et al ‘05+

Solve with fusion move
*Lempitsky et al ICCV ’07, ’08, ‘10, Woodford et al. ‘08+

Fusion move optimization:

1. X = arbitrary labelling X’ (use BHS algorithm)

initial X’

2. E’(X’) = binary MRF ● = =
for fusion with proposal
X’=0 X’=1

3. go to 2) if energy went down “Alpha expansion”

Final X’ X’=1

Application 1:
De-noising with Field-of-Experts
[Lempitsky et al ICCV ’07, ’08, ‘10, Woodford et al. ‘08]

Properties of fusion move:

1. Performance depends on performance of BHS algorithm (labelled nodes)
2. Guarantee: E goes not up
3. In practice often labelled nodes. Because:

θij (0,0) + θij (1,1) ≤ θij (0,1) + θij (1,0)

“often low cost”

X’=0 X’=1

noisy
Results
original Pairwise-model

Result: PSNR/E:

“Factor Graph:
BP based”

Pairwise-model TV-norm (continuous model) FoE From *Ishikawa PAMI ’09+

Results

Comparison with “substitution”:
Blur & random 0.0002% labelled

From *Ishikawa PAMI ’09+

Application 2: Curvature in stereo
*Woodford et al CVPR ‘08+
f(x1,x2,x3) = x1 -2x2 + x3where xi ϵ {0,…,D} depth
Example: slanted plane: f(1,2,3)=0

image Pair-wise Triple terms

From *Woodford et al. CVPR ’08+

Application 3:
Higher-Order Likelihood for optical flow *Glocker et al. ECCV ‘10+

Image 1 Image 2

• Pair-wise MRF: Likelihood in unaries as NCC cost: approximation error!
• Higher-order likelihood: done with triple cliques (ideally higher)

One image Bi-layer triangulation Optical flow

• Also use 3/4-order term to not penalize any affine motion

Any size, Special potentials
(and transformation approach)

Label-Cost Potential
[Hoiem et al. ’07, Delong et al. ’10, Bleyer et al. ‘10+

Image Grabcut-style result With cost for each new label
*Delong et al. ’10+
(Same function as [Zhu and Yuille ‘96+)
Label cost = 10c
Label cost = 4c

E(x) = P(x) + ∑ cl [ p: xp= l ] E: {1,…,L}n → R
E
“pairwise l Є L
“Label cost”
MRF”
Basic idea: penalize the complexity of the model
• Minimum description length (MDL)
• Bayesian information criterion (BIC)
• Akaike information criteriorn (AIC)
From *Delong et al. ’10+

How is it done …
In an alpha expansion step:
a b b c b ● a a a a a
x’= 0 x’= 1

example 1: 1 0 1 1 1 a b a a a Cost for b: cb
x’

example 2: 0 1 1 0 1 a a a c a Cost for b: 0
x’

Formally: E(x’) = P(x’) + ∑ (cl – cl Π x’p)
p Є Pl
lЄL

Case a<0: a Πxi = min aw (∑xi - |Pl|+1) Submodular!
p Є Pl w i

From *Delong et al. ’10+

Application: Model fitting
*Delong et al. ‘10+

No MRF

Example: surface-based stereo
*Bleyer et al. ‘10+

3D scene explained by a small set of 3D surfaces

Left Image surfaces depth surfaces depth

No Label With Label
Cost prior Cost prior

Example:
3DLayout CRF: Recognition and Segmentation
[Hoiem et. al ‘07+

Result s with instance cost

Robust(Soft) P n Potts model
*Kohli et. al. CVPR ‘07, ‘08, PAMI ’08, IJCV ‘09+

Image Segmentation
n = number of pixels
E: {0,1}n → R
E(X) = ∑ ci xi + ∑ dij |xi-xj| 0 →fg, 1→bg
i i,j

Image Unary Cost Segmentation

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother et al.`04]

Pn Potts Potentials

Patch Dictionary
(Tree)

h(Xp) = { 0 if xi = 0, i ϵ p
Cmax otherwise
p
Cmax  0
[slide credits: Kohli]

Pn Potts Potentials
E: {0,1}n → R
0 →fg, 1→bg

E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp)
i i,j p

h(Xp) = { 0 if xi = 0, i ϵ p
Cmax otherwise

p


Image Segmentation
E: {0,1}n → R
0 →fg, 1→bg

E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp)
i i,j p

Image Pairwise Segmentation Final Segmentation


Application:
Recognition and Segmentation

Image One super- another super-
pixelization pixelization

Unaries only Pairwise CRF only Pn Potts
TextonBoost *Shotton et al. ‘06+
*Shotton et al. ‘06+

from [Kohli et al. ‘08]

Robust(soft) Pn Potts model

h(xp) = { 0 if xi = 0, i ϵ p
f(∑xp) otherwise
p
p

Pn Potts Robust Pn Potts

from [Kohli et al. ‘08]

Application:
Recognition and Segmentation

Image One super- another super-
pixelization pixelization

Unaries only Pairwise CRF only Pn Potts robust Pn Potts robust Pn Potts
TextonBoost *Shotton et al. ‘06+ (different f)
*Shotton et al. ‘06+

From [Kohli et al. ‘08]

Same idea for surface-based stereo
*Bleyer ‘10+

One input Ground truth Stereo with Stereo with
image depth hard-segmentation robust Pn Potts

This approach gets best result on
Middlebury Teddy image-pair:

How is it done…
Most general binary function:
H (X) = F ( ∑ xi ) concave

H (X)

0 ∑ xi
The transformation is to a submodular pair-wise MRF, hence
optimization globally optimal


Higher order to Quadratic
• Start with Pn Potts model:

f(x) = { 0 if all xi = 0
C1 otherwise
x ϵ {0,1}n

min f(x) = min C1a + C1 (1-a) ∑xi
x x,a ϵ {0,1}
Higher Order Quadratic Submodular
Function Function

∑xi = 0 f(x) = 0 a=0
∑xi > 0 f(x) = C1 a=1


x x,a ϵ {0,1}
Higher Order Function Quadratic Submodular
Function

C1∑xi

C1

1 2 3
∑xi


x x,a ϵ {0,1}
Submodular Function Function

C1∑xi

a=0 a=1
Lower envelop
of concave
C1 functions is
concave

1 2 3
∑xi


min f(x)
x
= min f1 (x)a + f2(x) (1-a)
x,a ϵ {0,1}

f2(x)

f1(x) Lower envelop
of concave
functions is
concave

1 2 3
∑xi


min f(x)
x
= min f1 (x)a + f2(x) (1-a)
x,a ϵ {0,1}

f2(x)
+ a=0 a=1
f1(x) Lower envelop
= of concave
functions is
concave

1 2 3
∑xi
Arbitrary concave functions: sum potentials up
(each breakpoint adds a new binary variable) *Vicente et al. ‘09+ [slide credits: Kohli]

Beyond Pn Potts … soft Pattern-based Potentials
*Rother et al. ’08, Komodikis et al. ‘08+

Motivation: binary image de-noising

Result pairwise-MRF Higher-order MRF
Training Test with
9-connected
Image Image noise (7 attractive; 2 repulsive)

Higher Order Structure
not Preserved

Sparse higher-order functions

Minimize: E(X) = P(X) + ∑ hc (Xc)
c
Where: hc: {0,1}|c| → R

Higher Order Function (|c| = 10x10 = 100)
Assigns cost to 2100 possible labellings!

Exploit function structure to transform
it to a Pairwise function

How this can be done…
One clique, one pattern:

hc(x) =
{ 0 if xc = P0
k otherwise
P0

hc(x) = min ka + k(1-b) – ka(1-b) + k ∑ (1-a)xi + k ∑ b(1-xi)
a,b i ϵ S0(P0) i ϵ S1(P0)

Check it:
1. Pattern off => a=1,b=0 (cost k) k
2. Patter on => a=0, b=1 (cost 0)
k
Problem:
1. Term: “kab” is non-submodular k k
2. Only BP, TRW worked for inference -k
k
General Potential:
Add all terms up

Soft multi-label Pattern-based

P1 P2 P3

P patterns (multi-label)

P soft deviation functions

w1 w2 w3

Function per clique:

hc(x) = min {min ka + ∑ wia [xi ≠ Pa(i)] , kmax}
a ϵ {1,…,L} i

How it is done…
Function per clique:
hc(x) = min {min ka + ∑ wia [xi ≠ Pa(i)] , kmax}
a ϵ {1,…,L} i

z

With a pattern-switching variable z:

hc(xc) = min f(z) + ∑ g(z,xi)
z ϵ {1,…,L+1} iϵc

{
ka if z = a
f(z) =
kmax if z = L+1

g(z,xi) =
{ wia if z = a and xi ≠ Pa(i)
0 if z = L+1

We use BP for optimization, since
submodular and other solvers inferior

Results: Multi-label

Training; 256 labels Test; 256 labels Test + noise; 256 labels

Pairwise (15 label) 10 10x10 Hard Pattern (15 label) 10 10x10 Soft Pattern (15 label)
(5.6sec; BP 10iter.) (48sec; BP 10iter.) (48sec; BP 10iter.)

Standard Patch-based MRFs
[Learning Low-Level Vision, Freeman IJCV ‘04+

Multi-label

xj xl
xi xk

E(x) = U(x) + P(x) E: {1,…,L}n → R
measures patch
overlap

Not all labels possible (comparison still to be done)

CVPR2010: higher order models in computer vision: Part 1, 2

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to CVPR2010: higher order models in computer vision: Part 1, 2

Similar to CVPR2010: higher order models in computer vision: Part 1, 2 (20)

More from zukun

More from zukun (20)

Recently uploaded

Recently uploaded (20)

CVPR2010: higher order models in computer vision: Part 1, 2