Recursive Compressed Sensing

Recursive Compressed Sensing
Pantelis Sopasakis∗
Presentation at ICTEAM – UC Louvain, Belgium
joint work with N. Freris† and P. Patrinos‡
∗ IMT Institute for advanced studies Lucca, Italy
† NYU, Abu Dhabi, United Arab Emirates
‡ ESAT, KU Leuven, Belgium
April 7, 2016

Motivation
MRI
Radio-astronomy
Holography
Seismology
Photography
Radars
Facialrecognition
Speech
recognition
Fault detection
Medical
imaging
Facialrecognition
Particlephysics
Video processing
ECG
Encryption
Communication
networks
System identification
Compressed
Sensing
1 / 55

Spoiler alert!
The proposed method is an order of magnitude faster compared to
other reported methods for recursive compressed sensing.
2 / 55

Outline
1. Forward-Backward Splitting
2. The Forward-Backward envelope function
3. The Forward-Backward Newton method
4. Recursive compressed sensing
5. Simulations
3 / 55

Forward-Backward Splitting
Problem structure
minimize ϕ(x) = f(x) + g(x)
where
1. f, g : Rn → ¯R are proper, closed, convex
2. f has L-Lipschitz gradient
3. g is prox-friendly, i.e., its proximal operator
proxγg(v) := arg min
z
g(z) + 1
2 v − z 2
is easily computable[1]
.
[1]
Parikh & Boyd 2014; Combettes & Pesquette, 2010.
4 / 55

Example #1
Constrained QPs
minimize 1
2x Qx + q x
f
+ δ(x | B)
g
where B is a set on which projections are easy to compute and
δ(x | B) =
0, if x ∈ B,
+∞, otherwise
Then proxγg(x) = proj(x | B).
5 / 55

Example #2
LASSO problems
minimize 1
2 Ax − b 2
f
+ λ x 1
g
Indeed,
1. f is cont. diﬀ/ble with f(x) = A (Ax − b)
2. g is prox-friendly
6 / 55

Other examples
Constrained optimal control
Elastic net
Sparse log-logistic regression
Matrix completion
Subspace identiﬁcation
Support vector machines
7 / 55

FBS oﬀers a generic framework for solving such problems using the
iteration
xk+1
= proxγg(xk
− γ f(xk
)) =: Tγ(xk),
for γ < 2/L.
Features:
1. ϕ(xk) − ϕ ∈ O(1/k)
2. with Nesterov’s extrapolation ϕ(xk) − ϕ ∈ O(1/k2)
8 / 55

The iteration
xk+1
= proxγg(xk
− γ f(xk
)),
can be written as[2]
xk+1
= arg min
z
f(xk
) + f(xk
), z − xk
+ 1
2γ z − xk 2
Qf
γ(z,xk)
+g(z) ,
where Qf
γ(z, xk) serves as a quadratic model for f[3]
.
[2]
Beck and Teboulle, 2010.
[3]
Qf
γ (·, xk
) is the linearization of f at xk
plus a quadratic term; moreover, Qf
γ (z, xk
) ≥ f(x) and Qf
γ (z, z) = f(z).
9 / 55

x0
ϕ(x0)
ϕ = f + g
10 / 55

x0
ϕ(x0)
ϕ = f + g
11 / 55

x0
ϕ(x0)
ϕ = f + g
Qf
γ(z; x0
) + g(z)
12 / 55

x0 x1
ϕ(x0)
ϕ(x1)
ϕ = f + g
Qf
γ(z; x0
) + g(z)
13 / 55

x0 x1 x2
ϕ(x0)
ϕ(x1)
ϕ(x2)
ϕ = f + g
Qf
γ(z; x1
) + g(z)
14 / 55

x0 x1 x2 x3
ϕ(x0)
ϕ(x1)
ϕ(x2)
ϕ(x3)
ϕ = f + g
Qf
γ(z; x2
) + g(z)
15 / 55

Overview
Generic convex optimization problem
minimize f(x) + g(x).
The generic iteration
xk+1
= proxγg(xk
− γ f(xk
))
is a ﬁxed-point iteration for the optimality condition
x = proxγg(x − γ f(x ))
16 / 55

Overview
It generalizes several other methods
xk+1
=



xk − γ f(xk) gradient method, g = 0
ΠC(xk − γ f(xk)) gradient projection, g = δ(· | C)
proxγg(xk) proximal point algorithm, f = 0
There are several ﬂavors of proximal gradient algorithms[4]
.
[4]
Nesterov’s accelerated method, FISTA (Beck & Teboulle), etc.
17 / 55

Shortcomings
FBS are ﬁrst-order methods, therefore, they can be slow!
Overhaul. Use a better quadratic model for f[5]
:
Qf
γ,B(z, xk
) = f(xk
) + f(xk
), z − xk
+ 1
2γ z − xk 2
Bk ,
where Bk is (an approximation of) 2f(x).
Drawback. No closed form solution of the inner problem.
[5]
As in Becker & Fadili 2012; Lee et al. 2012; Tran-Dinh et al. 2013.
18 / 55

Forward-Backward Envelope
The Forward-Backward envelope of ϕ is deﬁned as
ϕγ(x) = min
z
f(x) + f(x), z − x + g(z) + 1
2γ z − x 2
,
with γ ≤ 1/L. Let’s see how it looks...
19 / 55

x
ϕ(x)
ϕγ(x)
ϕ
20 / 55

x
ϕ(x)
ϕγ(x)
ϕ
21 / 55

x
ϕ(x)
ϕγ(x)
ϕ
ϕγ
22 / 55

Properties of FBE
Deﬁne
Tγ(x) = proxγg(x − γ f(x))
Rγ(x) = γ−1
(x − Tγ(x))
FBE upper bound
ϕγ(x) ≤ ϕ(x) − 1
2γ Rγ(x) 2
FBE lower bound
ϕγ(x) ≥ ϕ(Tγ(x)) +
1−γLf
2γ Rγ(x) 2
x Tγ(x)
ϕ(x)
ϕ(Tγ(x))
ϕγ(x)
ϕ
ϕγ
x = Tγ(x )
ϕ(x )
ϕ
ϕγ
23 / 55

Properties of FBE
Ergo: Minimizing ϕ is equivalent to minimizing its FBE ϕγ.
inf ϕ = inf ϕγ
arg min ϕ = arg min ϕγ
However, ϕγ is continuously diﬀ/able[6]
whenever f ∈ C2.
[6]
More about the FBE: P. Patrinos, L. Stella and A. Bemporad, 2014.
24 / 55

FBE is C2
FBE can be written as
ϕγ(x) = f(x) − γ
2 f(x) 2
+ gγ
(x − f(x)),
where gγ is the Moreau envelope of g,
gγ
(v) = min
z
{g(z) + 1
2γ z − v 2
}
gγ is a smooth approximation of g with gγ(x) = γ−1(x − proxγg(x)). If
f ∈ C2, then
ϕγ(x) = (I − γ 2
f(x))Rγ(x).
Therefore,
arg min ϕ = arg min ϕγ = zer ϕγ.
25 / 55

The Moreau envelope
g(x) = |x|
g0.1
g10
26 / 55

Forward-Backward Newton
Since ϕγ is C1 but not C2, we may not apply a Newton method.
The FB Newton method is a semi-smooth method for minimizing ϕγ
using a notion of generalized differentiability.
The FBN iterations are
xk+1
= xk
+ τkdk
,
where dk is a Newton direction given by
Hkdk
= − ϕγ(xk
),
Hk ∈ ∂2
Bϕγ(xk
),
∂B is the so-called B-subdifferential (we’ll define it later)
27 / 55

Optimality conditions
LASSO problem
minimize 1
2 Ax − b 2
f
+ λ x 1
g
.
− f(x ) ∈ ∂g(x ).
where f(x) = A (Ax − b) and ∂g(x)i = λ sign(xi) for xi = 0 and
∂g(x)i = [−λ, λ] otherwise, so
− if(x ) = λ sign(xi ), if xi = 0,
| if(x )| ≤ λ, otherwise
28 / 55

If we knew the set
α = {i : xi = 0},
β = {j : xj = 0},
we would be able to write down the optimality conditions as
Aα Aαxα = Aα b + λ sign(xα)
Goal. Devise a method to determine α eﬃciently.
29 / 55

We may write the optimality conditions as follows
x = proxγg(x − γ f(x )),
where
proxγg(z)i = sign(zi)(|zi| − γλ)+.
ISTA and FISTA are method for the iterative solution of these
conditions. Instead, we are looking for a zero of the ﬁxed-point residual
operator
Rγ(x) = x − proxγg(x − γ f(x)).
30 / 55

B-subdifferential
For a function F : Rn → Rn which is almost everywhere differentiable, we
define its B-subdifferential to be[7]
∂BF(x) := B ∈ Rn×n ∃{xn}n : xn → x,
Rγ(xn) exists and Rγ(xn) → B
.
[7]
See Facchinei & Pang, 2004
31 / 55

Rγ(x) is nonexpansive ⇒ Lipschitz ⇒ Differentiable a.e. ⇒ B-sub-
differentiable (∂BRγ(x)). The proposed algorithm takes the form
xk+1
= xk
− τkH−1
k Rγ(xk
), with Hk ∈ ∂BRγ(xk
).
When close to the solution, all Hk are nonsingular. Take
Hk = I − Pk(I − γA A),
where Pk is diagonal with (Pk)ii = 1 iff i ∈ αk, where
αk = {i : |xk
i − γ if(xk
i )| > γλ}
The scalar τk is computed by a simple line search method to ensure
global convergence of the algorithm.
32 / 55

The Forward-Backward Newton method can be concisely written as
xk+1
= xk
+ τkdk
.
The Newton direction dk is determined as follows without the need to
formulate Hk
dk
βk
= −(Rγ(xk
))βk
,
γAαk
Aαk
dk
αk
= −(Rγ(xk
))αk
− γAαk
Aβk
dk
βk
.
For the method to converge globally, we compute τk so that the Armijo
condition is satisﬁed for ϕγ
ϕγ(xk
+ τkdk
) ≤ ϕγ(xk
) + ζτk ϕγ(xk
) dk
.
33 / 55

Require: A, y, λ, x0,
γ ← 0.95/ A 2
x ← x0
while Rγ(x) > do
α ← {i : |xi − γ if(x)| > γλ}
β ← {i : |xi − γ if(x)| ≤ γλ}
dβ ← −xβ
sα ← sign(xα − γ αf(x))
Solve Aα Aα(xα + dα) = Aα y − λsα
τ ← 1
while ϕγ(x + τd) ≤ ϕγ(x) + ζτ ϕγ(x) d do
τ ← 1
2τ
end while
x ← x + τd
end while
34 / 55

Speeding up FBN by Continuation
1. In applications of LASSO we have x 0 ≤ m n[8]
[8]
The zero-norm of x, x 0, is the number of its nonzeroes.
35 / 55

2. If λ ≥ λ0 := f(x0) ∞, then supp(x) = ∅
[8]
35 / 55

3. We relax the optimization problem solving
P(¯λ) : minimize 1
2 Ax − y 2
+ ¯λ x 1
[8]
35 / 55

2 Ax − y 2
+ ¯λ x 1
4. Once we have approximately solved P(¯λ) we update ¯λ as
¯λ ← max{η¯λ, λ},
until eventually ¯λ = λ.
[8]
35 / 55

2 Ax − y 2
+ ¯λ x 1
4. Once we have approximately solved P(¯λ) we update ¯λ as
¯λ ← max{η¯λ, λ},
until eventually ¯λ = λ.
5. This way we enforce that (i) |αk| increases smoothly, (ii) |αk| < m,
(iii) Aαk
Aαk
remains always positive deﬁnite.
[8]
35 / 55

Require: A, y, λ, x0, η ∈ (0, 1),
¯λ ← max{λ, f(x0) ∞}, ¯ ←
while ¯λ > λ or Rγ(xk; ¯λ) > do
xk+1 ← xk + τkdk (dk: Newton direction, τk line search)
if Rγ(xk; ¯λ) ≤ λ¯ then
¯λ ← max{λ, η¯λ}
¯ ← η¯
end if
end while
36 / 55

Further speed up
When Aα is positive deﬁnite[9]
, we may compute a Cholesky factorization
of Aα0
Aα0 and then update the Cholesky factorization of Aαk+1
Aαk+1
using the factorization of Aαk
Aαk
.
[9]
In practice, always (when the continuation heuristic is used). Furthermore, α0 = ∅.
37 / 55

Further speed up
Cheap
Expensive
Expensive
38 / 55

Further speed up
Cheap
Expensive
Expensive
39 / 55

Overview
Why FBN?
Fast convergence
Very fast convergence when close to the solution
Few, inexpensive iterations
The FBE serves as a merit function ensuring global convergence
40 / 55

IV. Recursive Compressed Sensing

Introduction
We say that a vector x ∈ Rn is s-sparse if it has at most s nonzeroes.
Assume that a sparsely-sampled signal y ∈ Rm (m n) is produced by
y = Ax,
by an s-sparse vector x and a sampling matrix A. In reality, however,
measurements will be noisy
y = Ax + w.
41 / 55

Sparse Sampling
We require that A satisﬁes the restricted isometry property[10]
, that is
(1 − δs) x 2
≤ Ax 2
≤ (1 + δs) x 2
A typical choice is a random matrix A with entries drawn from N(0, 1
m )
with m = 4s.
[10]
This can be established using the Johnson-Lindenstrauss lemma.
43 / 55

Decompression
Assuming that
w ∼ N(0, σ2I),
the smallest element of |x| is not too small (> 8σ
√
2 ln n),
λ = 4σ
√
2 ln n,
the LASSO recovers the support of x[11]
, that is
x = arg min 1
2 Ax − y 2
+ λ x 1,
has the same support as the actual x.
[11]
Cand`es & Plan, 2009.
44 / 55

Deﬁne
x(i)
:= xi xi+1 · · · xi+n−1
Then x(i) produces the measured signal
y(i)
= A(i)
x(i)
+ w(i)
.
Sampling is performed with a constant matrix A[12]
and
A(0)
= A,
A(i+1)
= A(i)
P,
where P is a permutation matrix which shifts the columns of A leftwards.
[12]
For details see: N. Freris, O. ¨O¸cal and M. Vetterli, 2014.
46 / 55

47 / 55

48 / 55

49 / 55

Require: Stream of observations, Window size n, Sparsity s
λ ← 4σ
√
2 ln n and m ← 4s
Construct A ∈ Rm×n with entries from N(0, 1
m )
A(0) ← A, x
(0)
◦ ← 0
for i = 0, 1, . . . do
1. Sample y(i) ∈ Rm
2. Support estimation (using the initial guess x
(i)
◦ )
x
(i)
= arg min 1
2 A(i)
x(i)
− y(i) 2
+ λ x(i)
1
3. Perform debiasing
4. x
(i+1)
◦ ← P x
(i)
5. A(i+1) ← A(i)P
end for
50 / 55

Simulations
We compared the proposed methodology with
ISTA (or proximal gradient method)
FISTA (or accelerated ISTA)
ADMM
L1LS (interior point method)
51 / 55

Simulations
For a 10%-sparse stream
Window size ×10 4
0.5 1 1.5 2
Averageruntime[s]
10 -1
10 0
10 1
FBN
FISTA
ADMM
L1LS
52 / 55

Simulations
For n = 5000 varying the stream sparsity
Sparsity [%]
0 5 10 15
Averageruntime[s]
10 -1
10 0
FBN
FISTA
ADMM
L1LS
53 / 55

References
1. S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “An interior- point
method for large-scale 1 -regularized least squares,” IEEE J Select Top Sign Proc,
1(4), pp. 606–617, 2007.
2. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for
linear inverse problems,” SIAM J Imag Sci, 2(1), pp. 183–202, 2009.
3. S. Becker and M. J. Fadili, “A quasi-Newton proximal splitting method,” in
Advances in Neural Information Processing Systems, vol. 1, pp. 2618–2626, 2012.
4. A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for
linear inverse problems,” SIAM J Imag Sci, 2(1), pp. 183–202, 2009.
5. P. Patrinos, L. Stella and A. Bemporad, “Forward-backward truncated Newton
methods for convex composite optimization,” arXiv:1402.6655, 2014.
6. P. Sopasakis, N. Freris and P. Patrinos, “Accelerated reconstruction of a
compressively sampled data stream,” 24th European Signal Processing
conference, submitted, 2016.
7. N. Freris, O. ¨O¸cal and M. Vetterli, “Recursive Compressed Sensing,”
arXiv:1312.4895, 2013.
54 / 55

Thank you for your attention.
55 / 55

Recursive Compressed Sensing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Recursive Compressed Sensing

Ähnlich wie Recursive Compressed Sensing (20)

Mehr von Pantelis Sopasakis

Mehr von Pantelis Sopasakis (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Recursive Compressed Sensing