Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guarantees

Low Complexity
Regularization of
Inverse Problems
Cours #2
Recovery Guarantees
Gabriel Peyré
www.numerical-tours.com

Overview of the Course

• Course #1: Inverse Problems

• Course #2: Recovery Guarantees

• Course #3: Proximal Splitting Methods

Overview

• Low-complexity Regularization with Gauges

• Performance Guarantees

• Grid-free Regularization

Inverse Problem Regularization
Observations: y = x0 + w 2 RP .
Estimator: x(y) depends only on

observations y
parameter


observations y
parameter

Example: variational methods
1
x(y) 2 argmin ||y
x||2 + J(x)
x2RN 2
Data ﬁdelity Regularity


observations y
parameter

1
x(y) 2 argmin ||y
x||2 + J(x)
x2RN 2

Choice of : tradeo

Noise level
||w||

Regularity of x0
J(x0 )


observations y
parameter

1
x(y) 2 argmin ||y
x||2 + J(x)
x2RN 2

Choice of : tradeo
No noise:

Noise level
||w||

0+ , minimize

Regularity of x0
J(x0 )

x? 2 argmin J(x)
x2RQ ,Kx=y


observations y
parameter

1
x(y) 2 argmin ||y
x||2 + J(x)
x2RN 2

Choice of : tradeo
No noise:

Noise level
||w||

0+ , minimize

Regularity of x0
J(x0 )

x? 2 argmin J(x)
x2RQ ,Kx=y

This course:

Performance analysis.
Fast computational scheme.

Union of Linear Models for Data Processing
Union of models: T 2 T linear spaces.
Synthesis
sparsity:
T

Coe cients x

Image

x

Synthesis
sparsity:

Structured
sparsity:

T

Coe cients x

Image

x

Synthesis
sparsity:

Structured
sparsity:

T

Coe cients x

Image

x

D

Analysis
sparsity:
Image x

Gradient D⇤ x

Synthesis
sparsity:

Structured
sparsity:

T

Coe cients x

Analysis
sparsity:

Image

x

D

Low-rank:

Image x

Gradient D⇤ x

S1,·

Multi-spectral imaging:
Pr
xi,· = j=1 Ai,j Sj,·

S2,·

x

S3,·

Gauges for Union of Linear Models
Gauge:

J :R

N

!R

+

Convex
8 ↵ 2 R+ , J(↵x) = ↵J(x)

Gauge:

J :R

N

!R

+

Convex
8 ↵ 2 R+ , J(↵x) = ↵J(x)

Piecewise regular ball , Union of linear models (T )T 2T

x
J(x) = ||x||1 T
T = sparse
vectors

Gauge:

J :R

N

!R

+

Convex
8 ↵ 2 R+ , J(↵x) = ↵J(x)


T0

x0

x

J(x) = ||x||1 T
T = sparse
vectors

Gauge:

J :R

N

!R

Convex
8 ↵ 2 R+ , J(↵x) = ↵J(x)

+


T
T0

x0

x

J(x) = ||x||1 T
T = sparse
vectors

x

T0

x
|x1 |+||x2,3 ||

T = block
sparse
vectors

0

Gauge:

J :R

N

!R

Convex
8 ↵ 2 R+ , J(↵x) = ↵J(x)

+


T
T0

x0

x

J(x) = ||x||1 T
T = sparse
vectors

x
T

x
|x1 |+||x2,3 ||

T = block
sparse
vectors

0

x

0

J(x) = ||x||⇤

T = low-rank
matrices

Gauge:

J :R

N

!R

Convex
8 ↵ 2 R+ , J(↵x) = ↵J(x)

+


T
T0

x0

x

J(x) = ||x||1 T
T = sparse
vectors

T0

x
T

x
|x1 |+||x2,3 ||

T = block
sparse
vectors

0

x

x
x0

0

J(x) = ||x||⇤

T = low-rank
matrices

J(x) = ||x||1

T = antisparse
vectors

Subdifferentials and Models
@J(x) = {⌘ 8 y, J(y) > J(x)+h⌘, y

xi}

|x|

@J(x) = {⌘ 8 y, J(y) > J(x)+h⌘, y

|x|

xi}

Example: J(x) = ||x||1
⇢
supp(⌘) = I,
@||x||1 = ⌘
8 j 2 I, |⌘j | 6 1
/

I = supp(x) = {i xi 6= 0}

@J(x)
0

x

@J(x) = {⌘ 8 y, J(y) > J(x)+h⌘, y

|x|

xi}

⇢
supp(⌘) = I,
@||x||1 = ⌘
8 j 2 I, |⌘j | 6 1
/

I = supp(x) = {i xi 6= 0}
Tx = {⌘ supp(⌘) = I}

Deﬁnition:

Tx = VectHull(@J(x))?

@J(x)
0

x

Tx

@J(x) = {⌘ 8 y, J(y) > J(x)+h⌘, y

|x|

xi}

⇢
supp(⌘) = I,
@||x||1 = ⌘
8 j 2 I, |⌘j | 6 1
/

@J(x)
0

I = supp(x) = {i xi 6= 0}

Tx

Tx = {⌘ supp(⌘) = I}
ex = sign(x)

Deﬁnition:

Tx = VectHull(@J(x))?
⌘ 2 @J(x)

ex x

=)

ProjTx (⌘) = ex

Examples
`1 sparsity: J(x) = ||x||1
ex = sign(x)

x

@J(x)

x

0

Tx = {z supp(z) ⇢ supp(x)}

Examples
ex = sign(x)

P
Structured sparsity: J(x) = b ||xb ||
N (a) = a/||a||
ex = (N (xb ))b2B

x

@J(x)

x

0

x

@J(x)

x

0

Examples
ex = sign(x)

P
N (a) = a/||a||
ex = (N (xb ))b2B

x = U ⇤V ⇤
SVD:
Nuclear norm: J(x) = ||x||⇤
Tx = U A + BV ⇤ (A, B) 2 (Rn⇥n )2
ex = U V ⇤

x

@J(x)

x

0

x

@J(x)

x

0

x
@J(x)

Examples
ex = sign(x)

P
N (a) = a/||a||
ex = (N (xb ))b2B

x = U ⇤V ⇤
SVD:
Nuclear norm: J(x) = ||x||⇤
Tx = U A + BV ⇤ (A, B) 2 (Rn⇥n )2
ex = U V ⇤
I = {i |xi | = ||x||1 }
Anti-sparsity: J(x) = ||x||1
Tx = {y yI / sign(xI )}
ex = |I| 1 sign(x)

x

@J(x)

x

0

x

@J(x)

@J(x)

x

0

x
@J(x)

x

x

0

Dual Certificates
Noiseless recovery:

min J(x)

x= x0

(P0 )

x?

x=

x0

Dual Certificates
Noiseless recovery:

min J(x)

x= x0

(P0 )

Proposition:
x0 solution of (P0 ) () 9 ⌘ 2 D(x0 )

Dual certiﬁcates:

D(x0 ) = Im(

⇤

⌘

x?

) @J(x0 )

@J(x0 )
x=

x0

Dual Certificates
Noiseless recovery:

min J(x)

(P0 )

x= x0

x?

Proposition:
x0 solution of (P0 ) () 9 ⌘ 2 D(x0 )

Dual certiﬁcates:
Proof:

(P0 )

()

D(x0 ) = Im(

min

2ker( )

8 (⌘, ) 2 @J(x0 ) ⇥ ker( ),
⌘ 2 Im(

⇤

)

=)

x0 solution

=)

h , ⌘i = 0

⇤

⌘

@J(x0 )
x=

) @J(x0 )

J(x0 + )
J(x0 + ) > J(x0 ) + h , ⌘i
=)

8 , h , ⌘i 6 0

x0 solution.
=)

⌘ 2 ker( )? .

x0

Dual Certificates and L2 Stability
Tight dual certiﬁcates:
¯
D(x0 ) = Im( ⇤ ) ri(@J(x0 ))
ri(E) = relative interior of E
= interior for the topology of a↵(E)

⌘

x?

@J(x0 )
x=

x0

¯
D(x0 ) = Im( ⇤ ) ri(@J(x0 ))

⌘

x?

@J(x0 )
x=

x0


Theorem:
¯
If 9 ⌘ 2 D(x0 ), for

[Fadili et al. 2013]

⇠ ||w|| one has ||x?

x0 || = O(||w||)

¯
D(x0 ) = Im( ⇤ ) ri(@J(x0 ))

⌘

x?

@J(x0 )
x=

x0


Theorem:
¯
If 9 ⌘ 2 D(x0 ), for



x0 || = O(||w||)

[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 .
[Grassmair 2012]: J(x? x0 ) = O(||w||).

¯
D(x0 ) = Im( ⇤ ) ri(@J(x0 ))

⌘

x?

@J(x0 )
x=

x0


Theorem:
¯
If 9 ⌘ 2 D(x0 ), for



x0 || = O(||w||)

[Grassmair, Haltmeier, Scherzer 2010]: J = || · ||1 .
[Grassmair 2012]: J(x? x0 ) = O(||w||).

! The constants depend on N . . .

Compressed Sensing Setting
Random matrix:

2 RP ⇥N ,

i,j

⇠ N (0, 1), i.i.d.

Random matrix:

2 RP ⇥N ,

Sparse vectors: J = || · ||1 .

Theorem: Let s = ||x0 ||0 . If

i,j

⇠ N (0, 1), i.i.d.

[Rudelson, Vershynin 2006]
[Chandrasekaran et al. 2011]

P > 2s log (N/s)
¯
Then 9⌘ 2 D(x0 ) with high probability on

.

Random matrix:

2 RP ⇥N ,



i,j

⇠ N (0, 1), i.i.d.


P > 2s log (N/s)
¯

Low-rank matrices: J = || · ||⇤ .

Theorem: Let r = rank(x0 ). If

.


x0 2 RN1 ⇥N2

P > 3r(N1 + N2 r)
¯

.

Random matrix:

2 RP ⇥N ,



i,j

⇠ N (0, 1), i.i.d.


P > 2s log (N/s)
¯

Low-rank matrices: J = || · ||⇤ .

Theorem: Let r = rank(x0 ). If

.


x0 2 RN1 ⇥N2

P > 3r(N1 + N2 r)
¯

! Similar results for || · ||1,2 , || · ||1 .

.

Phase Transitions
THE THE GEOMETRYPHASE TRANSITIONS IN CONVEX OPTIMIZATION
GEOMETRY OF OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION

J = || · ||1

1

100 100

J = || · ||⇤

1

900 900

75

75

P/N

50

P/N

600 600

50

300 300
25

25

0

0

0
00

25

25

50

50

75

s/N 1

75

100 100

0
00

0

0

10

10

p
r/ N 1

20

20

30

30

F IGURE Phase transitions for for linear inverse problems. [left] Recovery of sparse vectors. empirical
IGURE 2.2:2.2: Phase transitions linear inverse problems. [left] Recovery of sparse vectors. The The empiri
probability the `1 `1 minimization problem (2.6) identiﬁes a sparse vector 0 2 100 given random line
robability thatthat the minimization problem (2.6) identiﬁes a sparse vector x 0 2xR100Rgiven random linear

From [Amelunxen et al. 20013]

Minimal-norm Certificate
⌘ 2 D(x0 )

=)

⇢

⌘ = ⇤q
ProjT (⌘) = e

T = T x0
e = ex0

⌘ 2 D(x0 )

=)

⇢

⌘ = ⇤q
ProjT (⌘) = e

Minimal-norm pre-certiﬁcate: ⌘0 =

T = T x0
e = ex0

argmin
⌘=

⇤ q,⌘

T =e

||q||

⌘ 2 D(x0 )

=)

⇢

⌘ = ⇤q
ProjT (⌘) = e

Proposition:

One has

⌘0 = (

+
T

T = T x0
e = ex0

argmin
⌘=

)⇤ e

⇤ q,⌘

T =e

T

||q||

=

ProjT

⌘ 2 D(x0 )

=)

⇢

⌘ = ⇤q
ProjT (⌘) = e

Proposition:
Theorem:

One has

⌘0 = (

¯
If ⌘0 2 D(x0 ) and

+
T

T = T x0
e = ex0

argmin
⌘=

)⇤ e

⇤ q,⌘

T =e

T

||q||

=

ProjT

⇠ ||w||,

the unique solution x? of P (y) for y = x0 + w satisﬁes

Tx ? = T x 0

and ||x?

x0 || = O(||w||) [Vaiter et al. 2013]

⌘ 2 D(x0 )

=)

⇢

⌘ = ⇤q
ProjT (⌘) = e

Proposition:
Theorem:

One has

⌘0 = (

¯
If ⌘0 2 D(x0 ) and

+
T

T = T x0
e = ex0

argmin
⌘=

)⇤ e

⇤ q,⌘

T =e

T

||q||

=

ProjT

⇠ ||w||,

the unique solution x? of P (y) for y = x0 + w satisﬁes

Tx ? = T x 0

and ||x?

x0 || = O(||w||) [Vaiter et al. 2013]

[Fuchs 2004]: J = || · ||1 .
[Vaiter et al. 2011]: J = ||D⇤ · ||1 .
[Bach 2008]: J = || · ||1,2 and J = || · ||⇤ .

Random matrix:

2 RP ⇥N ,



i,j

⇠ N (0, 1), i.i.d.
[Wainwright 2009]
[Dossal et al. 2011]

P > 2s log(N )

¯
Then ⌘0 2 D(x0 ) with high probability on

.

Random matrix:

2 RP ⇥N ,

i,j


⇠ N (0, 1), i.i.d.
[Wainwright 2009]


P > 2s log(N )

¯

Phase
transitions:

L2 stability
P ⇠ 2s log(N/s)

vs.

.

Model stability
P ⇠ 2s log(N )

Random matrix:

2 RP ⇥N ,

i,j


⇠ N (0, 1), i.i.d.
[Wainwright 2009]


P > 2s log(N )

¯

Phase
transitions:

L2 stability
P ⇠ 2s log(N/s)

vs.

.

Model stability

! Similar results for || · ||1,2 , || · ||⇤ , || · ||1 .

P ⇠ 2s log(N )

Random matrix:

2 RP ⇥N ,

i,j


⇠ N (0, 1), i.i.d.
[Wainwright 2009]


P > 2s log(N )

¯

Phase
transitions:

L2 stability
P ⇠ 2s log(N/s)

vs.

.

Model stability

! Similar results for || · ||1,2 , || · ||⇤ , || · ||1 .

P ⇠ 2s log(N )

! Not using RIP technics (non-uniform result on x0 ).

1-D Sparse Spikes Deconvolution
⇥x =

xi (·

i)

x0

i

J(x) = ||x||1

Increasing :
reduces correlation.
reduces resolution.

x0

1-D Sparse Spikes Deconvolution
⇥x =

xi (·

x0

i)

i

J(x) = ||x||1

Increasing :
reduces correlation.
reduces resolution.

x0

||⌘0,I c ||1
2

1
0

10

20

I = {j x0 (j) 6= 0}
||⌘0,I c ||1 < 1
()
¯
⌘0 2 D(x0 )
()
support recovery.

Support Instability and Measures
1
N

When N ! +1, support is not stable:

||⌘0,I c ||1

! c > 1.

N !+1

||⌘0,I c ||1

c
1
Unstable

Stable

1
N


||⌘0,I c ||1

! c > 1.

N !+1

Intuition: spikes wants to move laterally.
! Use Radon measures m 2 M(T), T = R/Z.

||⌘0,I c ||1

c
1
Unstable

Stable

1
N


||⌘0,I c ||1

! c > 1.

N !+1

Intuition: spikes wants to move laterally.
! Use Radon measures m 2 M(T), T = R/Z.

Extension of `1 : total variation
Z
||m||TV = sup
g(x) dm(x)
||g||1 61

T

Discrete measure: mx,a =

P

i

ai

One has ||mx,a ||TV = ||a||1

xi .

||⌘0,I c ||1

c
1
Unstable

Stable

Sparse Measure Regularization
Measurements: y =

8
< m0 2 M(T),
2
: M(T) ! L (T),
(m0 ) + w where
:
2
w 2 L (T).

Measurements: y =

8
< m0 2 M(T),
2
: M(T) ! L (T),
(m0 ) + w where
:
2
w 2 L (T).

Acquisition operator:
Z
(m)(x) =
'(x, x0 )dm(x0 ) where ' 2 C 2 (T ⇥ T)
T

Measurements: y =

8
< m0 2 M(T),
2
: M(T) ! L (T),
(m0 ) + w where
:
2
w 2 L (T).

Z
(m)(x) =
'(x, x0 )dm(x0 ) where ' 2 C 2 (T ⇥ T)
T

Total-variation over measures regularization:
1
min
|| (m) y||2 + ||m||TV
m2M(T) 2

Measurements: y =

8
< m0 2 M(T),
2
: M(T) ! L (T),
(m0 ) + w where
:
2
w 2 L (T).

Z
(m)(x) =
'(x, x0 )dm(x0 ) where ' 2 C 2 (T ⇥ T)
T

Total-variation over measures regularization:
1
min
|| (m) y||2 + ||m||TV
m2M(T) 2
! Infinite dimensional convex program.

! If dim(Im( )) < +1, dual is finite dimensional.

! If

is a filtering, re-cast dual as SDP program.

Fuchs vs. Vanishing Pre-Certificates
Measures:

1
2 ||
m2M

min

m

y||2 + ||m||TV

+1

1

Measures:

On a grid z:

1
2 ||
m2M

m

1
2 ||

za

min

min

a2RN

y||2 + ||m||TV

+1

zi

y||2 + ||a||1
1

1
2 ||
m2M

m

1
2 ||

za

min

Measures:

On a grid z:

min

a2RN

y||2 + ||m||TV

+1

zi

y||2 + ||a||1
1

For m0 = mz,a0 , supp(m0 ) = x0 , supp(a0 ) = I:

⌘F =

⇤

⇤,+
I

sign(a0,I )

⌘F

1
2 ||
m2M

m

1
2 ||

za

min

Measures:

On a grid z:

min

a2RN

y||2 + ||m||TV

+1

zi

y||2 + ||a||1
1


⌘F =

⇤

⇤,+
I

sign(a0,I )
where

⌘F

⌘V =
x (a, b) =

⇤ +,⇤
x0

P

⌘V

sign(a0 ), 0

⇤

ai '(·, xi ) + bi '0 (·, xi )
i

1
2 ||
m2M

m

1
2 ||

za

min

Measures:

On a grid z:

min

a2RN

y||2 + ||m||TV

⌘F

+1

zi

y||2 + ||a||1
1

⌘V


⌘F =

⇤

⇤,+
I

sign(a0,I )
where

Theorem: [Fuchs 2004]
If 8 j 2 I, |⌘F (xj )| < 1,
/

⌘V =
x (a, b) =

⇤ +,⇤
x0

P

sign(a0 ), 0

⇤

ai '(·, xi ) + bi '0 (·, xi )
i

then supp(a ) = supp(a0 )

(holds for ||w|| small enough and

⇠ ||w||)

1
2 ||
m2M

m

1
2 ||

za

min

Measures:

On a grid z:

min

a2RN

y||2 + ||m||TV

⌘F

+1

zi

y||2 + ||a||1
1

⌘V


⌘F =

⇤

⇤,+
I

sign(a0,I )
where

Theorem: [Fuchs 2004]
If 8 j 2 I, |⌘F (xj )| < 1,
/

then supp(a ) = supp(a0 )

⌘V =
x (a, b) =

⇤ +,⇤
x0

P

sign(a0 ), 0

⇤

ai '(·, xi ) + bi '0 (·, xi )
i

Theorem: [Duval-Peyr´ 2013]
e
If 8 t 2 x0 , |⌘V (t)| < 1,
/
then m = mx ,a with
||x
x0 ||1 = O(||w||)

(holds for ||w|| small enough and

⇠ ||w||)

Numerical Illustration
0

Ideal low-pass ﬁlter: '(x, x ) =
+1

sin((2fc +1)⇡(x x0 ))
,
sin(⇡(x x0 ))

⌘F

Zoom

⌘F ⌘V
+1

1

fc = 6.

⌘V

0

+1

sin((2fc +1)⇡(x x0 ))
,
sin(⇡(x x0 ))

⌘F

Zoom

⌘F ⌘V
+1

⌘V

1

Solution path

7! a

fc = 6.

0

+1

sin((2fc +1)⇡(x x0 ))
,
sin(⇡(x x0 ))

⌘F

Zoom

⌘F ⌘V

fc = 6.

+1

⌘V

1

Discrete ! continuous:

Theorem: [Duval-Peyr´ 2013]
e
If ⌘V is valid, then a
is supported on pairs of
neighbors around supp(m0 ).

Solution path

7! a

(holds for

⇠ ||w|| small enough.

Conclusion
Gauges: encode linear models as singular points.

Conclusion

Performance measures

L2 error
model

di↵erent CS guarantees

Conclusion


L2 error
model

Speciﬁc certiﬁcate:
⌘ 0 , ⌘ F , ⌘V , . . .

Conclusion


L2 error
model

Speciﬁc certiﬁcate:
⌘ 0 , ⌘ F , ⌘V , . . .

Open problems:
– CS performance with arbitrary gauges.
– Approximate model recovery Tx? ⇡ Tx0 .
(e.g. grid-free recovery)

Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guarantees

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guarantees

Similar to Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guarantees (20)

More from Gabriel Peyré

More from Gabriel Peyré (17)

Recently uploaded

Recently uploaded (20)

Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guarantees