Influencing policy (training slides from Fast Track Impact)
CVPR2010: higher order models in computer vision: Part 1, 2
1. Tractable Higher Order Models in
Computer Vision
Carsten Rother
Sebastian Nowozin
Microsoft Research Cambridge
2. Schedule
830-900 Introduction
900-1000 Models: small cliques
and special potentials
1000-1030 Tea break
1030-1200 Inference: Relaxation techniques:
LP, Lagrangian, Dual Decomposition
1200-1230 Models: global potentials and
global parameters + discussion
3. A gentle intro to MRFs
Goal
z = (R,G,B)n x = {0,1}n
Given z and unknown (latent) variables x :
P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)
Posterior Likelihood Prior
Probability (data- (data-
dependent) independent)
Maximium a Posteriori (MAP): x* = argmax P(x|z)
x
4. Likelihood P(x|z) ~ P(z|x) P(x)
Green
Green
Red Red
5. Likelihood P(x|z) ~ P(z|x) P(x)
Log P(zi|xi=0) P(zi|xi=1)
Maximum likelihood:
x* = argmax P(z|x) =
X
argmax ∏ P(zi|xi)
x xi
6. Prior P(x|z) ~ P(z|x) P(x)
xi xj
P(x) = 1/f ∏ θij (xi,xj)
i,j Є N
f = ∑ ∏ θij (xi,xj) “partition function”
x i,j Є N
θij (xi,xj) = exp{-|xi-xj|} “ising prior”
(exp{-1}=0.36; exp{0}=1)
7. Prior
Pure Prior model: P(x) = 1/f ∏ exp{-|xi-xj|}
i,j Є N
Solutions with
Faire Samples
highest probability (mode)
P(x) = 0.011 P(x) = 0.012 P(x) = 0.012
Smoothness prior needs the likelihood
8. Weight prior and likelihood
w =0 w =10
w =40 w =200
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)
10. Energy minization
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
f(z,w) = ∑ exp{-E(x,z,w)}
X
-log P(x|z) = -log (1/f(z,w)) + E(x,z,w)
x* = argmin E(x,z,w) MAP same as minimum Energy
X
MAP; Global min E ML
11. Random Field Models for Computer Vision
Inference:
Model :
Variables: discrete or continuous? Combinatorial optimization: e.g. Graph Cut
If discrete: how many labels? Message Passing: e.g. BP, TRW
Space: discrete or continuous? Iterated Conditional Modes (ICM)
Dependences between variables? LP-relaxation: e.g. Cutting-plane
How many variables?
Problem decomposition + subgradient
…
…
Applications:
2D/3D Image segmentation Learning:
Object Recognition Exhaustive search (grid search)
3D reconstruction
Pseudo-Likelihood approximation
Stereo / optical flow
Image denoising Training in Pieces
Texture Synthesis Max-margin
Pose estimation …
Panoramic Stitching
…
12. Introducing Factor Graphs
Write probability distributions as Graphical model:
- Direct graphical model
- Undirected graphical model “traditionally used for MRFs”
- Factor graphs “best way to visualize the underlying energy”
References:
- Pattern Recognition and Machine Learning *Bishop ‘08, book, chapter 8+
- several lectures at the Machine Learning Summer School 2009
(see video lectures)
13. Factor Graphs
P(x) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5) “4 factors”
P(x) ~ exp{-E(x)} Gibbs distribution
E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)
unobserved
x1 x2
variables are in same factor.
x3 x4
x5
Factor graph
14. Definition “Order” x x2
1
Definition “Order”:
The arity (number of variables) of
the largest factor x3 x4
Factor graph
P(X) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)
with order 3
x5
arity 3 arity 2 Triple
clique
x1 x2
Extras:
• I will use “factor” and “clique” in the same way
• Not fully correct since clique may or may not
decompose x3 x4
• Definition of “order” same for clique and factor
• Markov Property: Random Field with low-order
Undirected
factors/cliques.
model
x5
15. Random Fields in Vision
4-connected; higher(8)-connected; MRF with Higher-order MRF
pairwise MRF pairwise MRF global variables
E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj)
i,j Є N4 i,j Є N8 i,j Є N8 i,j Є N4
+θ(x1,…,xn)
Order 2 Order 2 Order 2
Order n
16. 4-connected: Segmentation
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj)
i i,j Є N4
Observed variable
Unobserved variable
zi
xj xi
Factor graph
17. 4-connected: Segmentation (CRF)
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj)
i i,j Є N4
Observed
variable
zjj Unobserved
(latent) variable
zii
z
xjj
xji
Factor graph MRF
Conditional Random Field (CRF) no pure prior
18. 4-connected: Stereo matching
Image – left(a) Image – right(b) Ground truth depth [Boykov et al. ‘01+
• Images rectified
• Ignore occlusion for now
Energy: di
E(d): {0,…,D-1}n → R
Labels: d (depth/shift)
19. Random Fields in Vision
4-connected; higher(8)-connected; MRF with Higher-order MRF
pairwise MRF pairwise MRF global variables
E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x,) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj)
i,j Є N4 i,j Є N8 i,j Є N8 i,j Є N4
+θ(x1,…,xn)
Order 2 Order 2 Order 2
Order n
20. Highly-connect: Discretization artefacts
4-connected 8-connected
Euclidean Euclidean
higher-connectivity can model
true Euclidean length
*Boykov et al. ‘03; ‘05+
22. Stereo with occlusion
E(d): {1,…,D}2n → R
Each pixel is connected to D pixels in the other image
Ground truth Stereo with occlusion Stereo without occlusion
*Kolmogrov et al. ‘02+ *Boykov et al. ‘01+
23. Texture De-noising
Training images Test image Test image (60% Noise)
Result MRF Result MRF Result MRF
4-connected 4-connected 9-connected
(neighbours) (7 attractive; 2 repulsive)
24. Random Fields in Vision
4-connected; higher(8)-connected; MRF with Higher-order MRF
pairwise MRF pairwise MRF global variables
E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x,) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj)
i,j Є N4 i,j Є N8 i,j Є N8 i,j Є N4
+θ(x1,…,xn)
Order 2 Order 2 Order 2
Order n
25. Reason 4: Use Non-local parameters:
Interactive Segmentation (GrabCut)
*Boykov and Jolly ’01+
GrabCut *Rother et al. ’04+
26. MRF with Global parameters:
Interactive Segmentation (GrabCut)
w
Model jointly segmentation and color model:
E(x,w): {0,1}n x {GMMs}→ R
E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj)
i i,j Є N4
An object is a compact set of colors:
Red
Red
*Rother et al Siggraph ’04+
27. Latent/Hidden CRFs
“instance”
• ObjCut Kumar et. al. ‘05, Deformable
Part Model Felzenszwalb et al., CVPR “instance
’08; PoseCut Bray et al. ’06, LayoutCRF label”
Winn et al. ’06
• Hidden variable are never observed “parts”
(either training or test), e.g. parts
• Maximizing over hidden variables
• ML: Deep Belief Networks, Restricted
Booltzman machine [Hinton et al.]
(often sampling is done)
[LayoutCRF Winn et al. ’06+
28. Random Fields in Vision
4-connected; higher(8)-connected; MRF with Higher-order MRF
pairwise MRF pairwise MRF global variables
E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x,) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj)
i,j Є N4 i,j Є N8 i,j Є N8 i,j Є N4
+θ(x1,…,xn)
Order 2 Order 2 Order 2
Order n
29. First Example
User input Standard MRF: with connectivity:
Ising prior: Ising prior:
* Smoothing boundary * Smoothing boundary
* Removes noise
E(x) = P(x) + h(x) with h(x)= { ∞ if not 4-connected
0 otherwise
This Tutorial:
1. What higher-order models have been used in vision?
2. How is MAP inference done for those models?
3. Relationship between higher-order MRFs and MRFs with global variables?
30. Inference
(very brief summary)
• Message Passing Techniques (BP, TRW, TRW-S)
– Defined on the factor graph ([Potetz ’07, Lan ‘06+)
– Can be applied to any model (in theory)
(higher order, multi-label)
• LP-relaxation: (… more in part III)
– Relax original problem ({0,1} to [0,1])
and solve with existing techniques (e.g. sub-gradient)
– Can be applied any model (dep. on solver used)
– Connections to TRW (message passing)
31. Inference
(very brief summary)
• Dual/Problem Decomposition (… more in part III)
– Decompose (NP-)hard problem into tractable once.
Solve with sub-gradient
– Can be applied any model (dep. on solver used)
• Combinatorial Optimization: (… more in part II)
– Binary, Pairwise MRF: Graph cut, BHS (QPBO)
– Multiple label, pairwise: move-making; transformation
– Binary, higher-order factors: transformation
– Multi-label, higher-order factors:
move-making + transformation
32. Inference higher-oder models
(very brief summary)
• Arbitrary potentials are only tractable for order <7
(memory, computation time)
• For ≥7 potentials need some structure to be
exploited in order to make them tractable
33. Forthcoming book!
Advances in Markov Random Fields for Computer Vision
(Blake, Kohli, Rother)
• MIT Press (probably end of 2010)
• Most topics of this course and much, much more
• Contributors: usual suspects: Editors + Boykov, Kolmogorov,
Weiss, Freeman, Komodiakis, ....
34. Schedule
830-900 Introduction
900-1000 Models: small cliques
and special potentials
1000-1030 Tea break
1030-1200 Inference: Relaxation techniques:
LP, Lagrangian, Dual Decomposition
1200-1230 Models: global potentials
and discussion
36. Optimization: Binary, Pairwise
E(x) = ∑ θi (xi) + ∑ θij (xi,xj) xj ϵ {0,1}
i i,j Є N
Submodular:
• θij (0,0) + θij (1,1) ≤ θij (0,1) + θij (1,0)
• Condition holds naturally for many vision problems
(e.g. segmentation: |xi-xj|)
• Graph cut computes efficiently the global optimum
(~0.5sec for 1MPixel *Boykov, Kolmogorov ‘01+ )
Non-Submolduar:
• BHS algorithm (also called QPBO algorithm) ([Borors, Hammer, and Sun ’91,
Kolmogrov et al. ‘07+)
Graph cut on a special graph: Output ,0,1,’?’-;
• Partial optimality (various solutions for ‘?’ nodes)
• Solves underlying LP-relaxation
• Quality depends on application (see *Rother et al CVPR ‘07+)
• Extensions exists QPBOP, QPBOI (see [Rother et al CVPR ’07, Woodford et al. ‘08+)
37. Optimization: Binary, Pairwise
f(x1,x2) = θ11x1x2 + θ10x1(1-x2) + θ01(1-x1)x2 + θ00(1-x1)(1-x2)
f(x1,x2) = ax1x2 + bx1 + cx2 + d
Quadratic Pseudo-Boolean optimization (QPBO): B2 → R
Reminder : Encoding for graph-cut
a cut gives s
a labelling
(energy) Θ11 all weights are positive if
submodular
(re-parameterization to
Θ01 - Θ00 normal form)
x1 x2
Θ10 – Θ11
Θ00
t
38. Optimization: binary, higher-order
f(x1,x2,x3) = θ111x1x2x3 + θ110x1x2(1-x3) + θ101x1(1-x2)x3 + …
f(x1,x2,x3) = ax1x2x3 + bx1x2 + cx2x3… + 1
Quadratic polynomial can be done
Idea: transform 2+ order terms into 2nd order terms
Methods:
1. transformation by “substitution”
2. transformation by “min. selection”
39. Transformation by “substitution”
*Rosenberg ’75, Boros and Hammer ’02, Ali et al. ECCV ‘08+
f(x1,x2,x3) = x1x2x3 + x1x2 + x2
Auxiliary function:
D(x1,x2,z) = x1x2 – 2x1z – 2x2z + 3z z ϵ {0,1}
It is:
D(x1,x2,z) = 0 if x1x2 = z
D(x1,x2,z) > 0 if x1x2 ≠ z
“Substitution”:
f(x1,x2,x3) = min g(x1,x2,x3,z) = zx3 + z + x2 + K D(x1,x2,z)
z
Since K very large then x1x2 = z
Apply it recursively ….
Problem:
• Doesn’t work in practice *Ishikawa CVPR ‘09+
• function D is non-submodular and “K enforces this strongly”
40. Transformation by “min. selection”
[Freedman and Drineas ’05, Kolmogorov and Zabhi ’04, Ishikawa ’09+
f(x1,x2,x3) = ax1x2x3
Useful :
-x1x2x3 = min –z(x1+x2+x3-2) z ϵ {0,1}
z
Check:
- all x1,x2,x3 = 1 then z=1
- Otherwise z=0
Transform:
Case a<0: f(x1,x2,x3) = min –az (x1+x2+x3-2)
z
Case a>0: f(x1,x2,x3) = min a{z(x1+x2+x3-1)+(x1x2+x2x3+x3x1)-(x1+x2+x3+1)}
z
(similar trick)
41. Transformation by “min. selection”
The general case:
with nd = floor(d-1/2) many new variables w
From *Ishikawa PAMI ’09+
42. Full Procedure
*Ishikawa ‘09+
General 5-order potential:
f(x1,x2,x3) = ax1x2x3x4x5 + bx1x2x3x4 + c x1x2x3x5 + d x1x2x4x5 + …
… transform all 2+ degree terms are only degree 2 terms
• Worst case exponential: potential order 5 gives up to 15 new variables.
• Probably tractable for up to order 6
• May get very hard to solve (non-submodular)
• Code available online on Ishikawa’s webpage
43. Application 1:
De-noising with Field-of-Experts
[Roth and Black ’05, Ishikawa ‘09+
z
E(X) = ∑ (zi-xi)2 / 2σ2 + ∑ ∑ αk (1+ 0.5(Jk xc)2)
i c k
Unary FoE prior
liklihood
x
xc set of nxn patches (here 2x2)
Jk set of filters:
non-convex optimization problem
How to handle continuous labels in discrete MRF?
From *Ishikawa PAMI ’09,
Roth et al ‘05+
44. Solve with fusion move
*Lempitsky et al ICCV ’07, ’08, ‘10, Woodford et al. ‘08+
Fusion move optimization:
1. X = arbitrary labelling X’ (use BHS algorithm)
initial X’
2. E’(X’) = binary MRF ● = =
for fusion with proposal
X’=0 X’=1
3. go to 2) if energy went down “Alpha expansion”
Final X’ X’=1
45. Application 1:
De-noising with Field-of-Experts
[Lempitsky et al ICCV ’07, ’08, ‘10, Woodford et al. ‘08]
Properties of fusion move:
1. Performance depends on performance of BHS algorithm (labelled nodes)
2. Guarantee: E goes not up
3. In practice often labelled nodes. Because:
θij (0,0) + θij (1,1) ≤ θij (0,1) + θij (1,0)
“often low cost”
X’=0 X’=1
46. noisy
Results
original Pairwise-model
Result: PSNR/E:
“Factor Graph:
BP based”
Pairwise-model TV-norm (continuous model) FoE From *Ishikawa PAMI ’09+
48. Application 2: Curvature in stereo
*Woodford et al CVPR ‘08+
f(x1,x2,x3) = x1 -2x2 + x3where xi ϵ {0,…,D} depth
Example: slanted plane: f(1,2,3)=0
image Pair-wise Triple terms
From *Woodford et al. CVPR ’08+
49. Application 3:
Higher-Order Likelihood for optical flow *Glocker et al. ECCV ‘10+
Image 1 Image 2
• Pair-wise MRF: Likelihood in unaries as NCC cost: approximation error!
• Higher-order likelihood: done with triple cliques (ideally higher)
One image Bi-layer triangulation Optical flow
• Also use 3/4-order term to not penalize any affine motion
51. Label-Cost Potential
[Hoiem et al. ’07, Delong et al. ’10, Bleyer et al. ‘10+
Image Grabcut-style result With cost for each new label
*Delong et al. ’10+
(Same function as [Zhu and Yuille ‘96+)
Label cost = 10c
Label cost = 4c
E(x) = P(x) + ∑ cl [ p: xp= l ] E: {1,…,L}n → R
E
“pairwise l Є L
“Label cost”
MRF”
Basic idea: penalize the complexity of the model
• Minimum description length (MDL)
• Bayesian information criterion (BIC)
• Akaike information criteriorn (AIC)
From *Delong et al. ’10+
52. How is it done …
In an alpha expansion step:
a b b c b ● a a a a a
x’= 0 x’= 1
example 1: 1 0 1 1 1 a b a a a Cost for b: cb
x’
example 2: 0 1 1 0 1 a a a c a Cost for b: 0
x’
Formally: E(x’) = P(x’) + ∑ (cl – cl Π x’p)
p Є Pl
lЄL
Case a<0: a Πxi = min aw (∑xi - |Pl|+1) Submodular!
p Є Pl w i
From *Delong et al. ’10+
54. Example: surface-based stereo
*Bleyer et al. ‘10+
3D scene explained by a small set of 3D surfaces
Left Image surfaces depth surfaces depth
No Label With Label
Cost prior Cost prior
56. Robust(Soft) P n Potts model
*Kohli et. al. CVPR ‘07, ‘08, PAMI ’08, IJCV ‘09+
57. Image Segmentation
n = number of pixels
E: {0,1}n → R
E(X) = ∑ ci xi + ∑ dij |xi-xj| 0 →fg, 1→bg
i i,j
Image Unary Cost Segmentation
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother et al.`04]
58. Pn Potts Potentials
Patch Dictionary
(Tree)
h(Xp) = { 0 if xi = 0, i ϵ p
Cmax otherwise
p
Cmax 0
[slide credits: Kohli]
59. Pn Potts Potentials
n = number of pixels
E: {0,1}n → R
0 →fg, 1→bg
E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp)
i i,j p
h(Xp) = { 0 if xi = 0, i ϵ p
Cmax otherwise
p
[slide credits: Kohli]
60. Image Segmentation
n = number of pixels
E: {0,1}n → R
0 →fg, 1→bg
E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp)
i i,j p
Image Pairwise Segmentation Final Segmentation
[slide credits: Kohli]
61. Application:
Recognition and Segmentation
Image One super- another super-
pixelization pixelization
Unaries only Pairwise CRF only Pn Potts
TextonBoost *Shotton et al. ‘06+
*Shotton et al. ‘06+
from [Kohli et al. ‘08]
62. Robust(soft) Pn Potts model
h(xp) = { 0 if xi = 0, i ϵ p
f(∑xp) otherwise
p
p
Pn Potts Robust Pn Potts
from [Kohli et al. ‘08]
63. Application:
Recognition and Segmentation
Image One super- another super-
pixelization pixelization
Unaries only Pairwise CRF only Pn Potts robust Pn Potts robust Pn Potts
TextonBoost *Shotton et al. ‘06+ (different f)
*Shotton et al. ‘06+
From [Kohli et al. ‘08]
64. Same idea for surface-based stereo
*Bleyer ‘10+
One input Ground truth Stereo with Stereo with
image depth hard-segmentation robust Pn Potts
This approach gets best result on
Middlebury Teddy image-pair:
65. How is it done…
Most general binary function:
H (X) = F ( ∑ xi ) concave
H (X)
0 ∑ xi
The transformation is to a submodular pair-wise MRF, hence
optimization globally optimal
[slide credits: Kohli]
66. Higher order to Quadratic
• Start with Pn Potts model:
f(x) = { 0 if all xi = 0
C1 otherwise
x ϵ {0,1}n
min f(x) = min C1a + C1 (1-a) ∑xi
x x,a ϵ {0,1}
Higher Order Quadratic Submodular
Function Function
∑xi = 0 f(x) = 0 a=0
∑xi > 0 f(x) = C1 a=1
[slide credits: Kohli]
67. Higher order to Quadratic
min f(x) = min C1a + C1 (1-a) ∑xi
x x,a ϵ {0,1}
Higher Order Function Quadratic Submodular
Function
C1∑xi
C1
1 2 3
∑xi
[slide credits: Kohli]
68. Higher order to Quadratic
min f(x) = min C1a + C1 (1-a) ∑xi
x x,a ϵ {0,1}
Higher Order Quadratic Submodular
Submodular Function Function
C1∑xi
a=0 a=1
Lower envelop
of concave
C1 functions is
concave
1 2 3
∑xi
[slide credits: Kohli]
69. Higher order to Quadratic
min f(x)
x
= min f1 (x)a + f2(x) (1-a)
x,a ϵ {0,1}
Higher Order Quadratic Submodular
Submodular Function Function
f2(x)
f1(x) Lower envelop
of concave
functions is
concave
1 2 3
∑xi
[slide credits: Kohli]
70. Higher order to Quadratic
min f(x)
x
= min f1 (x)a + f2(x) (1-a)
x,a ϵ {0,1}
Higher Order Quadratic Submodular
Submodular Function Function
f2(x)
+ a=0 a=1
f1(x) Lower envelop
= of concave
functions is
concave
1 2 3
∑xi
Arbitrary concave functions: sum potentials up
(each breakpoint adds a new binary variable) *Vicente et al. ‘09+ [slide credits: Kohli]
71. Beyond Pn Potts … soft Pattern-based Potentials
*Rother et al. ’08, Komodikis et al. ‘08+
Motivation: binary image de-noising
Result pairwise-MRF Higher-order MRF
Training Test with
9-connected
Image Image noise (7 attractive; 2 repulsive)
Higher Order Structure
not Preserved
72. Sparse higher-order functions
Minimize: E(X) = P(X) + ∑ hc (Xc)
c
Where: hc: {0,1}|c| → R
Higher Order Function (|c| = 10x10 = 100)
Assigns cost to 2100 possible labellings!
Exploit function structure to transform
it to a Pairwise function
73. How this can be done…
One clique, one pattern:
hc(x) =
{ 0 if xc = P0
k otherwise
P0
hc(x) = min ka + k(1-b) – ka(1-b) + k ∑ (1-a)xi + k ∑ b(1-xi)
a,b i ϵ S0(P0) i ϵ S1(P0)
Check it:
1. Pattern off => a=1,b=0 (cost k) k
2. Patter on => a=0, b=1 (cost 0)
k
Problem:
1. Term: “kab” is non-submodular k k
2. Only BP, TRW worked for inference -k
k
General Potential:
Add all terms up
74. Soft multi-label Pattern-based
P1 P2 P3
P patterns (multi-label)
P soft deviation functions
w1 w2 w3
Function per clique:
hc(x) = min {min ka + ∑ wia [xi ≠ Pa(i)] , kmax}
a ϵ {1,…,L} i
75. How it is done…
Function per clique:
hc(x) = min {min ka + ∑ wia [xi ≠ Pa(i)] , kmax}
a ϵ {1,…,L} i
z
With a pattern-switching variable z:
hc(xc) = min f(z) + ∑ g(z,xi)
z ϵ {1,…,L+1} iϵc
{
ka if z = a
f(z) =
kmax if z = L+1
g(z,xi) =
{ wia if z = a and xi ≠ Pa(i)
0 if z = L+1
We use BP for optimization, since
submodular and other solvers inferior
76. Results: Multi-label
Training; 256 labels Test; 256 labels Test + noise; 256 labels
Pairwise (15 label) 10 10x10 Hard Pattern (15 label) 10 10x10 Soft Pattern (15 label)
(5.6sec; BP 10iter.) (48sec; BP 10iter.) (48sec; BP 10iter.)
77. Standard Patch-based MRFs
[Learning Low-Level Vision, Freeman IJCV ‘04+
Multi-label
xj xl
xi xk
E(x) = U(x) + P(x) E: {1,…,L}n → R
measures patch
overlap
Not all labels possible (comparison still to be done)