QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
Talk icml
1. Online Dictionary Learning for
Sparse Coding
Julien Mairal1 Francis Bach1
Jean Ponce2 Guillermo Sapiro3
1
INRIA–Willow project 2 Ecole Normale Supérieure
3
University of Minnesota
ICML, Montréal, June 2008
2. What this talk is about
Learning efficiently dictionaries (basis set) for sparse
coding.
Solving a large-scale matrix factorization problem.
Making some large-scale image processing problems
tractable.
Proposing an algorithm which extends to NMF, sparse
PCA,. . .
3. 1 The Dictionary Learning Problem
2 Online Dictionary Learning
3 Extensions
4. 1 The Dictionary Learning Problem
2 Online Dictionary Learning
3 Extensions
6. The Dictionary Learning Problem
[Elad & Aharon (’06)]
Solving the denoising problem
Extract all overlapping 8 × 8 patches xi .
Solve a matrix factorization problem:
n 1
min ||x − Dαi ||2 + λ||αi ||1 ,
αi ,D∈C i=1 2 i 2
sparsity
reconstruction
with n > 100, 000
Average the reconstruction of each patch.
7. The Dictionary Learning Problem
[Mairal, Bach, Ponce, Sapiro & Zisserman (’09)]
Denoising result
10. The Dictionary Learning Problem
n 1
min ||xi − Dαi ||2 + λ||αi ||1
2
α∈R i=1 2
k×n
D∈C
C = {D ∈ Rm×k s.t. ∀j = 1, . . . , k, ||dj ||2 ≤ 1}.
Classical optimization alternates between D
and α.
Good results, but very slow!
11. 1 The Dictionary Learning Problem
2 Online Dictionary Learning
3 Extensions
12. Online Dictionary Learning
Classical formulation of dictionary learning
1 n
min fn (D) = min l(xi , D),
D∈C D∈C n i=1
where
1
l(x, D) = mink ||x − Dα||2 + λ||α||1 .
2
α∈R 2
13. Online Dictionary Learning
Which formulation are we interested in?
1 n
min f (D) = Ex [l(x, D)] ≈ lim l(xi , D)
D∈C n→+∞ n
i=1
14. Online Dictionary Learning
Online learning can
handle potentially infinite datasets,
adapt to dynamic training sets,
be dramatically faster than batch
algorithms [Bottou & Bousquet (’08)].
15. Online Dictionary Learning
Proposed approach
1: for t=1,. . . ,T do
2: Draw xt
3: Sparse Coding
1
αt ← arg min ||xt − Dt−1 α||2 + λ||α||1 ,
2
α∈Rk 2
4: Dictionary Learning
1 t 1
Dt ← arg min ||xi −Dαi ||2 +λ||αi ||1 ,
2
D∈C t i=1 2
5: end for
16. Online Dictionary Learning
Proposed approach
Implementation details
Use LARS for the sparse coding step,
Use a block-coordinate approach for the
dictionary update, with warm restart,
Use a mini-batch.
17. Online Dictionary Learning
Proposed approach
Which guarantees do we have?
Under a few reasonable assumptions,
ˆ
we build a surrogate function ft of the
expected cost f verifying
ˆ
lim ft (Dt ) − f (Dt ) = 0,
t→+∞
Dt is asymptotically close to a stationary
point.
34. Conclusion
Take-home message
Online techniques are adapted to the
dictionary learning problem.
Our method makes some large-scale
image processing tasks tractable—. . .
. . . — and extends to various matrix
factorization problems.