14. Compression by Transform-coding
forward
f
transform
a[m] = ⇥f, m⇤ R q[m] Z
bin T
⇥ 2T
|a[m]| 2T T T a[m]
Quantization: q[m] = sign(a[m]) Z
T a[m]
˜
Image f Zoom on f Quantized q[m]
15. Compression by Transform-coding
forward codi
f
transform
a[m] = ⇥f, m⇤ R q[m] Z ng
bin T
⇥ 2T
|a[m]| 2T T T a[m]
Quantization: q[m] = sign(a[m]) Z
T a[m]
˜
Entropic coding: use statistical redundancy (many 0’s).
Image f Zoom on f Quantized q[m]
16. Compression by Transform-coding
forward codi
f
transform
a[m] = ⇥f, m⇤ R q[m] Z n g
bin T
ng
c odi
de
q[m] Z
⇥ 2T
|a[m]| 2T T T a[m]
Quantization: q[m] = sign(a[m]) Z
T a[m]
˜
Entropic coding: use statistical redundancy (many 0’s).
Image f Zoom on f Quantized q[m]
17. Compression by Transform-coding
forward codi
f
transform
a[m] = ⇥f, m⇤ R q[m] Z n g
bin T
ng
c odi
dequantization de
a[m]
˜ q[m] Z
⇥ 2T
|a[m]| 2T T T a[m]
Quantization: q[m] = sign(a[m]) Z
T a[m]
˜
Entropic coding: use statistical redundancy (many 0’s).
⇥
1
Dequantization: a[m] = sign(q[m]) |q[m] +
˜ T
2
Image f Zoom on f Quantized q[m]
18. Compression by Transform-coding
forward codi
f
transform
a[m] = ⇥f, m⇤ R q[m] Z n g
bin T
ng
c odi
backward dequantization de
fR = a[m]
˜ m a[m]
˜ q[m] Z
transform
⇥ 2T
m IT
|a[m]| 2T T T a[m]
Quantization: q[m] = sign(a[m]) Z
T a[m]
˜
Entropic coding: use statistical redundancy (many 0’s).
⇥
1
Dequantization: a[m] = sign(q[m]) |q[m] +
˜ T
2
Image f Zoom on f Quantized q[m] f , R =0.2 bit/pixel
19. Compression by Transform-coding
forward codi
f
transform
a[m] = ⇥f, m⇤ R q[m] Z n g
bin T
ng
c odi
backward dequantization de
fR = a[m]
˜ m a[m]
˜ q[m] Z
transform
⇥ 2T
m IT
|a[m]| 2T T T a[m]
Quantization: q[m] = sign(a[m]) Z
T a[m]
˜
Entropic coding: use statistical redundancy (many 0’s).
⇥
1
Dequantization: a[m] = sign(q[m]) |q[m] +
˜ T
2
“Theorem:” ||f fM ||2 = O(M ) =⇥ ||f fR ||2 = O(log (R)R )
Image f Zoom on f Quantized q[m] f , R =0.2 bit/pixel
20. Wavelets…JPEG-2000 vs. JPEG
JPEG2k: exploit the statistical redundancy of coe cients.
+ embedded coder.
chunks of large coe cients.
neighboring coe cients are not independent.
Compression
JPEG Compression EZW Compression
orm
Image f JPEG, R = .19bit/pxl JPEG2k, R = .15bit/pxl
25. Piecewise Regular Functions in 1D
Theorem: If f is C outside a finite set of discontinuities:
O(M 1
) (Fourier),
||f fM || =
2
O(M 2
) (wavelets).
For Fourier, linear non-linear, sub-optimal.
For wavelets, linear non-linear, optimal.
26. Piecewise Regular Functions in 2D
Theorem: If f is C outside a set of finite length edge curves,
O(M 1/2
) (Fourier),
||f fM || =
2
O(M 1
) (wavelets).
Fourier Wavelets.
Wavelets: same result for BV functions (optimal).
27. Piecewise Regular Functions in 2D
Theorem: If f is C outside a set of finite length edge curves,
O(M 1/2
) (Fourier),
||f fM || =
2
O(M 1
) (wavelets).
Fourier Wavelets.
Wavelets: same result for BV functions (optimal).
Regular C edges: sub-optimal (requires anisotropy).
28. Geometrically Regular Images
Geometric image model: f is C outside a set of C edge curves.
BV image: level sets have finite lengths.
Geometric image: level sets are regular.
| f|
Geometry = cartoon image Sharp edges Smoothed edges
32. Approximation with Triangulation
Vertices V = {vi }M .
Triangulation (V, F): i=1
Faces F {1, . . . , M }3 .
M
Piecewise linear approximation: fM = m ⇥m
m=1
= argmin ||f µm ⇥m ||
µ
m
⇥m (vi ) = m
i is a ne on each face of F.
33. Approximation with Triangulation
Vertices V = {vi }M .
Triangulation (V, F): i=1
Faces F {1, . . . , M }3 .
M
Piecewise linear approximation: fM = m ⇥m
m=1
= argmin ||f µm ⇥m ||
µ
m
⇥m (vi ) = m
i is a ne on each face of F.
Theorem:
There exists (V, F) such that Regular areas:
M/2 equilateral triangles.
2
||f fM || Cf M
1/2
M
Singular areas:
M 1/2 M/2 anisotropic triangles.
34. Approximation with Triangulation
Vertices V = {vi }M .
Triangulation (V, F): i=1
Faces F {1, . . . , M }3 .
M
Piecewise linear approximation: fM = m ⇥m
m=1
= argmin ||f µm ⇥m ||
µ
m
⇥m (vi ) = m
i is a ne on each face of F.
Theorem:
There exists (V, F) such that Regular areas:
M/2 equilateral triangles.
2
||f fM || Cf M
1/2
M
Optimal (V, F): NP-hard.
Singular areas:
Provably good greedy schemes: M 1/2 M/2 anisotropic triangles.
[Mirebeau, Cohen, 2009]
41. Image Representation
Q 1
Dictionary D = {dm }m=0 of atoms dm RN .
Q 1
Image decomposition: f = xm dm = Dx
m=0 dm
xm
dm
=
f D x
42. Image Representation
Q 1
Dictionary D = {dm }m=0 of atoms dm RN .
Q 1
Image decomposition: f = xm dm = Dx
m=0 dm
Image approximation: f Dx
xm
dm
=
f D x
43. Image Representation
Q 1
Dictionary D = {dm }m=0 of atoms dm RN .
Q 1
Image decomposition: f = xm dm = Dx
m=0 dm
Image approximation: f Dx
xm
dm
Orthogonal dictionary: N = Q =
xm = f, dm
f D x
44. Image Representation
Q 1
Dictionary D = {dm }m=0 of atoms dm RN .
Q 1
Image decomposition: f = xm dm = Dx
m=0 dm
Image approximation: f Dx
xm
dm
Orthogonal dictionary: N = Q =
xm = f, dm
f D x
Redundant dictionary: N Q
Examples: TI wavelets, curvelets, . . .
x is not unique.
45. Sparsity
Q 1
Decomposition: f= xm dm = Dx
m=0
Sparsity: most xm are small.
Example: wavelet transform.
Image f
Coe cients x
46. Sparsity
Q 1
Decomposition: f= xm dm = Dx
m=0
Sparsity: most xm are small.
Example: wavelet transform.
Image f
Ideal sparsity: most xm are zero.
J0 (x) = | {m xm = 0} |
Coe cients x
47. Sparsity
Q 1
Decomposition: f= xm dm = Dx
m=0
Sparsity: most xm are small.
Example: wavelet transform.
Image f
Ideal sparsity: most xm are zero.
J0 (x) = | {m xm = 0} |
Approximate sparsity: compressibility
||f Dx|| is small with J0 (x) M.
Coe cients x
48. Sparse Coding
Q 1
Redundant dictionary D = {dm }m=0 , Q N.
non-unique representation f = Dx.
Sparsest decomposition: min J0 (x)
f =Dx
49. Sparse Coding
Q 1
Redundant dictionary D = {dm }m=0 , Q N.
non-unique representation f = Dx.
Sparsest decomposition: min J0 (x)
f =Dx
1
Sparsest approximation: min ||f Dx|| + J0 (x)
2
x 2
Equivalence min ||f Dx||
M ⇥ J0 (x) M
min J0 (x)
||f Dx||
50. Sparse Coding
Q 1
Redundant dictionary D = {dm }m=0 , Q N.
non-unique representation f = Dx.
Sparsest decomposition: min J0 (x)
f =Dx
1
Sparsest approximation: min ||f Dx|| + J0 (x)
2
x 2
Equivalence min ||f Dx||
M ⇥ J0 (x) M
min J0 (x)
||f Dx||
Ortho-basis D:
⇤ Pick the M largest
f, dm ⇥ if |xm | 2 coe cients
xm =
0 otherwise. in { f, dm ⇥}m
51. Sparse Coding
Q 1
Redundant dictionary D = {dm }m=0 , Q N.
non-unique representation f = Dx.
Sparsest decomposition: min J0 (x)
f =Dx
1
Sparsest approximation: min ||f Dx|| + J0 (x)
2
x 2
Equivalence min ||f Dx||
M ⇥ J0 (x) M
min J0 (x)
||f Dx||
Ortho-basis D:
⇤ Pick the M largest
f, dm ⇥ if |xm | 2 coe cients
xm =
0 otherwise. in { f, dm ⇥}m
General redundant dictionary: NP-hard.
58. Regularized Inversion
Denoising/compression: y = f0 + w RN .
Sparse approximation: f = Dx where
1
x ⇥ argmin ||y Dx||2 + ||x||1
x 2
Fidelity Replace
D by D
Inverse problems y = f0 + w RP .
1
x ⇥ argmin ||y Dx|| + ||x||1
2
x 2
59. Regularized Inversion
Denoising/compression: y = f0 + w RN .
Sparse approximation: f = Dx where
1
x ⇥ argmin ||y Dx||2 + ||x||1
x 2
Fidelity Replace
D by D
Inverse problems y = f0 + w RP .
1
x ⇥ argmin ||y Dx|| + ||x||1
2
x 2
Numerical solvers: proximal splitting schemes.
www.numerical-tours.com
62. Dictionary Learning: MAP Energy
Set of (noisy) exemplars {yk }k .
1
Sparse approximation: min ||yk Dxk || + ||xk ||1
2
xk 2
63. Dictionary Learning: MAP Energy
Set of (noisy) exemplars {yk }k .
1
Sparse approximation: min min ||yk Dxk || + ||xk ||1
2
D C xk
k
2
Dictionary learning
64. Dictionary Learning: MAP Energy
Set of (noisy) exemplars {yk }k .
1
Sparse approximation: min min ||yk Dxk || + ||xk ||1
2
D C xk
k
2
Dictionary learning
Constraint: C = {D = (dm )m m, ||dm || 1}
Otherwise: D + , X 0
65. Dictionary Learning: MAP Energy
Set of (noisy) exemplars {yk }k .
1
Sparse approximation: min min ||yk Dxk || + ||xk ||1
2
D C xk
k
2
Dictionary learning
Constraint: C = {D = (dm )m m, ||dm || 1}
Otherwise: D + , X 0
Matrix formulation:
1
min f (X, D) = ||Y DX|| + ||X||1
2
X⇥R Q K 2
D⇥C RN Q
66. Dictionary Learning: MAP Energy
Set of (noisy) exemplars {yk }k .
1
Sparse approximation: min min ||yk Dxk || + ||xk ||1
2
D C xk
k
2
Dictionary learning
Constraint: C = {D = (dm )m m, ||dm || 1}
Otherwise: D + , X 0
Matrix formulation:
1
min f (X, D) = ||Y DX|| + ||X||1
2
X⇥R Q K 2
min f (X, D)
D⇥C R N Q
X
Convex with respect to X.
Convex with respect to D. D
Non-onvex with respect to (X, D). Local minima
68. Dictionary Learning: Algorithm
Step 1: k, minimization on xk
1
min ||yk Dxk || + ||xk ||1
2
xk 2
Convex sparse coding.
Step 2: Minimization on D D, initialization
min ||Y DX|| 2
D C
Convex constraint minimization.
69. Dictionary Learning: Algorithm
Step 1: k, minimization on xk
1
min ||yk Dxk || + ||xk ||1
2
xk 2
Convex sparse coding.
Step 2: Minimization on D D, initialization
min ||Y DX|| 2
D C
Convex constraint minimization.
Projected gradient descent:
D ( +1)
= ProjC D ( )
(D ( )
X Y )X
70. Dictionary Learning: Algorithm
Step 1: k, minimization on xk
1
min ||yk Dxk || + ||xk ||1
2
xk 2
Convex sparse coding.
Step 2: Minimization on D D, initialization
min ||Y DX|| 2
D C
Convex constraint minimization.
Projected gradient descent:
D ( +1)
= ProjC D ( )
(D ( )
X Y )X
Convergence: toward a stationary point
of f (X, D). D, convergence
71. Patch-based Learning
Learning D
Exemplar patches yk Dictionary D
[Olshausen, Fields 1997]
State of the art denoising [Elad et al. 2006]
72. Patch-based Learning
Learning D
Exemplar patches yk Dictionary D
[Olshausen, Fields 1997]
State of the art denoising [Elad et al. 2006]
Learning D
Sparse texture synthesis, inpainting [Peyr´ 2008]
e
76. Patch-based Denoising
Noisy image: f = f0 + w.
Step 1: Extract patches. yk (·) = f (zk + ·)
Step 2: Dictionary learning.
1
min ||yk Dxk || + ||xk ||1
2
D,(xk )k 2
k
yk
[Aharon & Elad 2006]
77. Patch-based Denoising
Noisy image: f = f0 + w.
Step 1: Extract patches. yk (·) = f (zk + ·)
Step 2: Dictionary learning.
1
min ||yk Dxk || + ||xk ||1
2
D,(xk )k 2
k
Step 3: Patch averaging. yk = Dxk
˜
˜
f (·) ⇥ yk (· zk )
˜
k
yk ˜
yk
[Aharon & Elad 2006]
78. Learning with Missing Data
Inverse problem: y = f0 + w LEARNING MULTISCALE AND S
1 1
min ||y f || +
2
||pk (f ) Dxk || + ⇥||xk ||1
2
f,(xk )k 2 2
k
D C
f0
Patch extractor: pk (f ) = f (zk + ·)
pk
LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237
(a) Original
y
(a) Original (b) Damaged
79. Learning with Missing Data
Inverse problem: y = f0 + w LEARNING MULTISCALE AND S
1 1
min ||y f || +
2
||pk (f ) Dxk || + ⇥||xk ||1
2
f,(xk )k 2 2
k
D C
f0
Patch extractor: pk (f ) = f (zk + ·)
pk
Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237
Convex sparse coding.
(a) Original
y
(a) Original (b) Damaged
80. Learning with Missing Data
Inverse problem: y = f0 + w LEARNING MULTISCALE AND S
1 1
min ||y f || +
2
||pk (f ) Dxk || + ⇥||xk ||1
2
f,(xk )k 2 2
k
D C
f0
Patch extractor: pk (f ) = f (zk + ·)
pk
Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237
Convex sparse coding.
Step 2: Minimization on D
(a) Original
y
Quadratic constrained.
(a) Original (b) Damaged
81. Learning with Missing Data
Inverse problem: y = f0 + w LEARNING MULTISCALE AND S
1 1
min ||y f || +
2
||pk (f ) Dxk || + ⇥||xk ||1
2
f,(xk )k 2 2
k
D C
f0
Patch extractor: pk (f ) = f (zk + ·)
pk
Step 1: k, minimization on xk LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237
Convex sparse coding.
Step 2: Minimization on D
(a) Original
y
Quadratic constrained.
Step 3: Minimization on f
Quadratic.
(a) Original (b) Damaged
82. Inpainting Example
LEARNING MULTISCALE AND SPARSE REPRESENTATIONS
LEARNING MULTISCALE AND SPARSE REPRESENTATIONS 237
237
(a) Original (b) Damaged
Image f0
(a) Original
(a) Original
Observations
(c) Restored, N = 1
(b) Damaged
(b) Damaged
Regularized f
(d) Restored, N = 2
y = using + w
Fig. 14. Inpainting f0 N = 2 and n = 16 × 16 (bottom-right image), or N = 1 and n = 8 × 8
(bottom-left). J = 100 iterations were performed, producing an adaptive dictionary. During the
learning, 50% of the patches were used. A sparsity factor L = 10 has been used during the learning
process and L = 25 for the final reconstruction. The damaged image was created by removing 75% of
the data from the original image. The initial PSNR is 6.13dB. The resulting PSNR for N = 2 is
[Mairal et al. 2008]
33.97dB and 31.75dB for N = 1.
83. Adaptive Inpainting and Separation
Wavelets Local DCT
Wavelets Local DCT Learned
[Peyr´, Fadili, Starck 2010]
e
84. OISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR
TS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3
D DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPRO
SENTATION FOR COLOR IMAGE RESTORATION
EST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUC
K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT AR
s are reduced with our proposed technique (
Higher Dimensional Learning
mples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).
in our proposed new metric). Both images have been denoised with the same global dictionary.
bserves a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when MAIRAL
rs), which is another artifact our approach corrected. (a) Original. (b) Original algorithm,
dB.
dB. (c) Proposed algorithm,
et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION
naries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.
can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
OLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
Learning D
(a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
TABLE I
Fig. 7. Data set used for evaluating denoising experiments.
les of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposedatoms learned on new metric).
Fig. 2. Dictionaries with 256 ( in the a generic database of natural images, with two d
are reduced with our proposed technique ( TABLE I our proposed new metric). Both images have been denoised negative values, the vectors are presented scaled and shifted to the [0,2
in
Since the atoms can have
with the same global dictionary.
ervesITH 256 ATOMS OF SIZE castle and in3 FOR of the water. What is more, the color of the sky is . EACH constant IS DIVIDED IN FOUR
W a bias effect in the color from the 7 7 some part AND 6 6 3 FOR piecewise CASE when
EN BY is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
), which
MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm,
3 MODEL.” THE TOP-RIGHT RESUL
dB.
AND 6
M [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINED
E BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.
EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
6 3 FOR . EAC
85. OISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR
TS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3
D DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPRO
SENTATION FOR COLOR IMAGE RESTORATION
EST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUC
K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT AR
Higher Dimensional Learning
O NLINE L EARNING FOR M ATRIX FACTORIZATION AND FACTORIZATION AND S PARSE C ODING
O NLINE L EARNING FOR M ATRIX S PARSE C ODING
mples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed (
s are reduced with our proposed technique (
in the new metric).
in our proposed new metric). Both images have been denoised with the same global dictionary.
bserves a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when MAIRAL
rs), which is another artifact our approach corrected. (a) Original. (b) Original algorithm,
dB.
dB. (c) Proposed algorithm,
et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION
naries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.
can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.
OLOR IMAGE RESTORATION 61
Fig. 7. Data set used for evaluating denoising experiments.
Learning D
(a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.
TABLE I
Fig. 7. Data set used for evaluating denoising experiments.
les of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposedatoms learned on new metric).
Fig. 2. Dictionaries with 256 ( in the a generic database of natural images, with two d
are reduced with our proposed technique ( TABLE I our proposed new metric). Both images have been denoised negative values, the vectors are presented scaled and shifted to the [0,2
in
Since the atoms can have
with the same global dictionary.
ervesITH 256 ATOMS OF SIZE castle and in3 FOR of the water. What is more, the color of the sky is . EACH constant IS DIVIDED IN FOUR
W a bias effect in the color from the 7 7 some part AND 6 6 3 FOR piecewise CASE when
EN BY is another artifact our approach corrected. (a)HEIR “3(b) Original algorithm, HE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY
), which
MCAULEY AND AL [28] WITH T Original. 3 MODEL.” T dB. (c) Proposed algorithm,
3 MODEL.” THE TOP-RIGHT RESUL
dB.
AND 6
M [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINED
E BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS. Inpainting
EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS
6 3 FOR . EAC
87. Facial Image Compression O. Bryt, M. Elad / J. Vis. Commun. Image R. 19 (2008) 270–282 271
[Elad et al. 2009]
show recognizable faces. We use a database containing around 6000
such facial images, some of which are used for training and tuning
the algorithm, and the others for testing it, similar to the approach
Image registration.
taken in [17].
In our work we propose a novel compression algorithm, related
to the one presented in [17], improving over it.
Our algorithm relies strongly on recent advancements made in
using sparse and redundant representation of signals [18–26], and
learning their sparsifying dictionaries [27–29]. We use the K-SVD
algorithm for learning the dictionaries for representing small
image patches in a locally adaptive way, and use these to sparse-
code the patches’ content. This is a relatively simple and
straight-forward algorithm with hardly any entropy coding stage.
Yet, it is shown to be superior to several competing algorithms:
(i) the JPEG2000, (ii) the VQ-based algorithm presented in [17],
and (iii) A Principal Component Analysis (PCA) approach.2 Fig. 1. (Left) Piece-wise affine warping of the image by triangulation. (Right) A
In the next section we provide some background material for uniform slicing to disjoint square patches for coding purposes.
this work: we start by presenting the details of the compression
algorithm developed in [17], as their scheme is the one we embark K-Means) per each patch separately, using patches taken from the
from in the development of ours. We also describe the topic of same location from 5000 training images. This way, each VQ is
sparse and redundant representations and the K-SVD, that are adapted to the expected local content, and thus the high perfor-
the foundations for our algorithm. In Section 3 we turn to present mance presented by this algorithm. The number of code-words
the proposed algorithm in details, showing its various steps, and in the VQ is a function of the bit-allocation for the patches. As
discussing its computational/memory complexities. Section 4 we argue in the next section, VQ coding is limited by the available
presents results of our method, demonstrating the claimed number of examples and the desired rate, forcing relatively small
superiority. We conclude in Section 5 with a list of future activities patch sizes. This, in turn, leads to a loss of some redundancy be-
that can further improve over the proposed scheme. tween adjacent patches, and thus loss of potential compression.
Another ingredient in this algorithm that partly compensates
2. Background material for the above-described shortcoming is a multi-scale coding
scheme. The image is scaled down and VQ-coded using patches
2.1. VQ-based image compression of size 8 Â 8. Then it is interpolated back to the original resolution,
and the residual is coded using VQ on 8 Â 8 pixel patches once
Among the thousands of papers that study still image again. This method can be applied on a Laplacian pyramid of the
compression algorithms, there are relatively few that consider original (warped) image with several scales [33].
the treatment of facial images [2–17]. Among those, the most As already mentioned above, the results shown in [17] surpass
recent and the best performing algorithm is the one reported in those obtained by JPEG2000, both visually and in Peak-Signal-to-
[17]. That paper also provides a thorough literature survey that Noise Ratio (PSNR) quantitative comparisons. In our work we pro-
compares the various methods and discusses similarities and pose to replace the coding stage from VQ to sparse and redundant
differences between them. Therefore, rather than repeating such representations—this leads us to the next subsection, were we de-
a survey here, we refer the interested reader to [17]. In this scribe the principles behind this coding strategy.
sub-section we concentrate on the description of the algorithm
in [17] as our method resembles it to some extent. 2.2. Sparse and redundant representations
88. Facial Image Compression O.O. Bryt, M. EladJ. J. Vis. Commun. Image R. 19 (2008) 270–282
Bryt, M. Elad / / Vis. Commun. Image R. 19 (2008) 270–282 271
271
[Elad et al. 2009]
show recognizable faces. We use a a database containing around 6000
show recognizable faces. We use database containing around 6000
such facial images, some of which are used for training and tuning
such facial images, some of which are used for training and tuning
the algorithm, and the others for testing it, similar to the approach
the algorithm, and the others for testing it, similar to the approach
Image registration.
taken in [17].
taken in [17].
In our work we propose a a novel compression algorithm, related
In our work we propose novel compression algorithm, related
to the one presented in [17], improving over it.
to the one presented in [17], improving over it.
Our algorithm relies strongly on recent advancements made in
Our algorithm relies strongly on recent advancements made in
Non-overlapping patches (fk )k .
using sparse and redundant representation of signals [18–26], and
using sparse and redundant representation of signals [18–26], and fk
learning their sparsifying dictionaries [27–29]. We use the K-SVD
learning their sparsifying dictionaries [27–29]. We use the K-SVD
algorithm for learning the dictionaries for representing small
algorithm for learning the dictionaries for representing small
image patches in a a locally adaptive way, and use these to sparse-
image patches in locally adaptive way, and use these to sparse-
code the patches’ content. This isis a a relatively simple and
code the patches’ content. This relatively simple and
straight-forward algorithm with hardly any entropy coding stage.
straight-forward algorithm with hardly any entropy coding stage.
Yet, itit is shown to be superior to several competing algorithms:
Yet, is shown to be superior to several competing algorithms:
(i) the JPEG2000, (ii) the VQ-based algorithm presented in [17],
(i) the JPEG2000, (ii) the VQ-based algorithm presented in [17],
2
and (iii) AA Principal Component Analysis (PCA) approach.2
and (iii) Principal Component Analysis (PCA) approach. Fig. 1.1. (Left) Piece-wise affine warping of the image by triangulation. (Right) A
Fig. (Left) Piece-wise affine warping of the image by triangulation. (Right) A
In the next section we provide some background material for
In the next section we provide some background material for uniform slicing toto disjoint square patches for coding purposes.
uniform slicing disjoint square patches for coding purposes.
this work: we start by presenting the details of the compression
this work: we start by presenting the details of the compression
algorithm developed in [17], as their scheme isis the one we embark
algorithm developed in [17], as their scheme the one we embark K-Means) per each patch separately, using patches taken from the
K-Means) per each patch separately, using patches taken from the
from in the development of ours. We also describe the topic of
from in the development of ours. We also describe the topic of same location from 5000 training images. This way, each VQ isis
same location from 5000 training images. This way, each VQ
sparse and redundant representations and the K-SVD, that are
sparse and redundant representations and the K-SVD, that are adapted to the expected local content, and thus the high perfor-
adapted to the expected local content, and thus the high perfor-
the foundations for our algorithm. In Section 3 3 we turn to present
the foundations for our algorithm. In Section we turn to present mance presented by this algorithm. The number of code-words
mance presented by this algorithm. The number of code-words
the proposed algorithm in details, showing its various steps, and
the proposed algorithm in details, showing its various steps, and in the VQ isisa afunction of the bit-allocation for the patches. As
in the VQ function of the bit-allocation for the patches. As
discussing its computational/memory complexities. Section 4 4
discussing its computational/memory complexities. Section we argue in the next section, VQ coding isis limited by the available
we argue in the next section, VQ coding limited by the available
presents results of our method, demonstrating the claimed
presents results of our method, demonstrating the claimed number of examples and the desired rate, forcing relatively small
number of examples and the desired rate, forcing relatively small
superiority. We conclude in Section 5 5 with a list of future activities
superiority. We conclude in Section with a list of future activities patch sizes. This, in turn, leads to a a loss of some redundancy be-
patch sizes. This, in turn, leads to loss of some redundancy be-
that can further improve over the proposed scheme.
that can further improve over the proposed scheme. tween adjacent patches, and thus loss of potential compression.
tween adjacent patches, and thus loss of potential compression.
Another ingredient in this algorithm that partly compensates
Another ingredient in this algorithm that partly compensates
2. Background material
2. Background material for the above-described shortcoming isis a a multi-scale coding
for the above-described shortcoming multi-scale coding
scheme. The image isisscaled down and VQ-coded using patches
scheme. The image scaled down and VQ-coded using patches
2.1. VQ-based image compression
2.1. VQ-based image compression of size 8 8 Â 8. Then it is interpolated back to the original resolution,
of size  8. Then it is interpolated back to the original resolution,
and the residual isiscoded using VQ on 8 8 Â 8pixel patches once
and the residual coded using VQ on  8 pixel patches once
Among the thousands of papers that study still image
Among the thousands of papers that study still image again. This method can be applied on a a Laplacian pyramid of the
again. This method can be applied on Laplacian pyramid of the
compression algorithms, there are relatively few that consider
compression algorithms, there are relatively few that consider original (warped) image with several scales [33].
original (warped) image with several scales [33].
the treatment of facial images [2–17]. Among those, the most
the treatment of facial images [2–17]. Among those, the most As already mentioned above, the results shown in [17] surpass
As already mentioned above, the results shown in [17] surpass
recent and the best performing algorithm isis the one reported in
recent and the best performing algorithm the one reported in those obtained by JPEG2000, both visually and in Peak-Signal-to-
those obtained by JPEG2000, both visually and in Peak-Signal-to-
[17]. That paper also provides a athorough literature survey that
[17]. That paper also provides thorough literature survey that Noise Ratio (PSNR) quantitative comparisons. In our work we pro-
Noise Ratio (PSNR) quantitative comparisons. In our work we pro-
compares the various methods and discusses similarities and
compares the various methods and discusses similarities and pose to replace the coding stage from VQ to sparse and redundant
pose to replace the coding stage from VQ to sparse and redundant
differences between them. Therefore, rather than repeating such
differences between them. Therefore, rather than repeating such representations—this leads us to the next subsection, were we de-
representations—this leads us to the next subsection, were we de-
a asurvey here, we refer the interested reader to [17]. In this
survey here, we refer the interested reader to [17]. In this scribe the principles behind this coding strategy.
scribe the principles behind this coding strategy.
sub-section we concentrate on the description of the algorithm
sub-section we concentrate on the description of the algorithm
in [17] as our method resembles itit to some extent.
in [17] as our method resembles to some extent. 2.2. Sparse and redundant representations
2.2. Sparse and redundant representations
89. Facial Image Compression O. Bryt, M. Elad / J. Vis. Commun. Image R. 19 (2008) 270–282
Before turning to preset the results we should add the follow-
O.O. Bryt, M. EladJ. J. ing: while all theImage R. 19 (2008) 270–282 specific database
Bryt, M. Elad / / Vis. Commun. Image R. shown here 270–282
Vis. Commun. results 19 (2008) refer to the
was trained for patch number 80 (The left
271
coding atoms, and similarly, in Fig. 7 we271
can
we operate on, the overall scheme proposed is general and should was trained for patch number 87 (The right
apply to other face images databases just as well. Naturally, some sparse coding atoms. It can be seen that bot
[Elad et al. 2009]
show recognizable faces. We use a a database containing around 6000 the parameters might be necessary, and among those,
show recognizable faces. We use database containing around 6000 in
changes images similar in nature to the image patch
such facial images, some of which are used for training and tuning size is the most important to consider. We also note that
such facial images, some of which are used for training andthe patch
tuning trained for. A similar behavior was observed
the algorithm, and the others for testing it, similar to the approach from one source of images to another, this relative size
as one shifts
the algorithm, and the others for testing it, similar to the approach
of the background in the photos may vary, and
the
necessarily 4.2. Reconstructed images
Image registration.
taken in [17].
taken in [17]. leads to changes in performance. More specifically, when the back-
In our work we propose a a novel compression algorithm, ground small such larger (e.g., the images we useperformance is
In our work we propose novel compression algorithm, related related
tively
regions are
regions),
the
compression
here have rela- Our coding strategy allows us to learn w
age are more difficult than others to co
to the one presented in [17], improving over it.
to the one presented in [17], improving over it. expected to improve. assigning the same representation error th
Our algorithm relies strongly on recent advancements made in
Our algorithm relies strongly on recent advancements made in dictionaries
Non-overlapping patches (fk )k .
4.1. K-SVD
using sparse and redundant representation of signals [18–26], and
using sparse and redundant representation of signals [18–26], and fk
patches, and observing how many atoms
representation of each patch on average.
a small number of allocated atoms are simp
learning their sparsifying dictionaries [27–29]. We use the K-SVD
learning their sparsifying dictionaries [27–29]. We use the K-SVDThe primary stopping condition for the training process was set others. We would expect that the represent
to be a limitation on the maximal number of K-SVD iterations of the image such as the background, p
algorithm for learning the dictionaries for representing (being 100). A secondary stopping condition was a limitation on
algorithm for learning the dictionaries for representingsmall small maybe parts of the clothes will be simpler
image patches in a a locally adaptive way, and use these to sparse-
image patches in locally adaptive way, and use these to sparse-
the minimal representation error. In the image compression stage tion of areas containing high frequency e
Dictionary learning (Dk )k .
we added a limitation on the maximal number of atoms per patch. hair or the eyes. Fig. 8 shows maps of atom
code the patches’ content. This isis a a relatively simple and
code the patches’ content. This relatively simple and
These conditions were used to allow us to better control the rates and representation error (RMSE—squared
straight-forward algorithm with hardly any entropy coding of stage.
stage.
straight-forward algorithm with hardly any entropy coding the resulting images and the overall simulation time. squared error) per patch for the images in
Every obtained dictionary contains 512 patches of size different bit-rates. It can be seen that more
Yet, itit is shown to be superior to several competing algorithms: as atoms. In Fig. 6 we can see the dictionary that
Yet, is shown to be superior to several competing algorithms:pixels
15 Â 15 to patches containing the facial details (h
(i) the JPEG2000, (ii) the VQ-based algorithm presented in [17],
(i) the JPEG2000, (ii) the VQ-based algorithm presented in [17],
2
and (iii) AA Principal Component Analysis (PCA) approach.2
and (iii) Principal Component Analysis (PCA) approach. Fig. 1.1. (Left) Piece-wise affine warping of the image by triangulation. (Right) A
Fig. (Left) Piece-wise affine warping of the image by triangulation. (Right) A
In the next section we provide some background material for
In the next section we provide some background material for uniform slicing toto disjoint square patches for coding purposes.
uniform slicing disjoint square patches for coding purposes.
this work: we start by presenting the details of the compression
this work: we start by presenting the details of the compression
algorithm developed in [17], as their scheme isis the one we embark
algorithm developed in [17], as their scheme the one we embark K-Means) per each patch separately, using patches taken from the
K-Means) per each patch separately, using patches taken from the
from in the development of ours. We also describe the topic of
from in the development of ours. We also describe the topic of same location from 5000 training images. This way, each VQ isis
same location from 5000 training images. This way, each VQ
sparse and redundant representations and the K-SVD, that are
sparse and redundant representations and the K-SVD, that are adapted to the expected local content, and thus the high perfor-
adapted to the expected local content, and thus the high perfor-
the foundations for our algorithm. In Section 3 3 we turn to present
the foundations for our algorithm. In Section we turn to present mance presented by this algorithm. The number of code-words
mance presented by this algorithm. The number of code-words
the proposed algorithm in details, showing its various steps, and
the proposed algorithm in details, showing its various steps, and in the VQ isisa afunction of the bit-allocation for the patches. As
in the VQ function of the bit-allocation for the patches. As
discussing its computational/memory complexities. Section 4 4
discussing its computational/memory complexities. Section we argue in the next section, VQ coding isis limited by the available
we argue in the next section, VQ coding limited by the available
presents results of our method, demonstrating the claimed
presents results of our method, demonstrating the claimed number of examples and the desired rate, forcing relatively small
number of examples and the desired rate, forcing relatively small
superiority. We conclude in Section 5 5 with a list of future activities
superiority. We conclude in Section with a list of future activities patch sizes. This, in turn, leads to a a loss of some redundancy be-
patch sizes. This, in turn, leads to loss of some redundancy be-
that can further improve over the proposed scheme.
that can further improve over the proposed scheme. tween adjacent patches, and thus loss of potential compression.
tween adjacent patches, and thus loss of potential compression.
2. Background material
2. Background material
AnotherThe Dictionary obtained by K-SVD for Patch No. 80 (the that partlyOMPcompensates
Another ingredient in this algorithmleft eye) using the compensates
Fig. 6. ingredient in this algorithm that partly method with L ¼ 4.
for the above-described shortcoming isis a a multi-scale coding
for the above-described shortcoming multi-scale coding
Dk
scheme. The image isisscaled down and VQ-coded using patches
scheme. The image scaled down and VQ-coded using patches
2.1. VQ-based image compression
2.1. VQ-based image compression of size 8 8 Â 8. Then it is interpolated back to the original resolution,
of size  8. Then it is interpolated back to the original resolution,
and the residual isiscoded using VQ on 8 8 Â 8pixel patches once
and the residual coded using VQ on  8 pixel patches once
Among the thousands of papers that study still image
Among the thousands of papers that study still image again. This method can be applied on a a Laplacian pyramid of the
again. This method can be applied on Laplacian pyramid of the
compression algorithms, there are relatively few that consider
compression algorithms, there are relatively few that consider original (warped) image with several scales [33].
original (warped) image with several scales [33].
the treatment of facial images [2–17]. Among those, the most
the treatment of facial images [2–17]. Among those, the most As already mentioned above, the results shown in [17] surpass
As already mentioned above, the results shown in [17] surpass
recent and the best performing algorithm isis the one reported in
recent and the best performing algorithm the one reported in those obtained by JPEG2000, both visually and in Peak-Signal-to-
those obtained by JPEG2000, both visually and in Peak-Signal-to-
[17]. That paper also provides a athorough literature survey that
[17]. That paper also provides thorough literature survey that Noise Ratio (PSNR) quantitative comparisons. In our work we pro-
Noise Ratio (PSNR) quantitative comparisons. In our work we pro-
compares the various methods and discusses similarities and
compares the various methods and discusses similarities and pose to replace the coding stage from VQ to sparse and redundant
pose to replace the coding stage from VQ to sparse and redundant
differences between them. Therefore, rather than repeating such
differences between them. Therefore, rather than repeating such representations—this leads us to the next subsection, were we de-
representations—this leads us to the next subsection, were we de-
a asurvey here, we refer the interested reader to [17]. In this
survey here, we refer the interested reader to [17]. In this scribe the principles behind this coding strategy.
scribe the principles behind this coding strategy.
sub-section we concentrate on the description of the algorithm
sub-section we concentrate on the description of the algorithm
in [17] as our method resembles itit to some extent.
in [17] as our method resembles to some extent. 2.2. Sparse and redundant representations
2.2. Sparse and redundant representations