SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Directed Research Report:
Reconstructing Images Using Singular Value Decomposition
Ryen Krusinga
rkruser@umich.edu
Igor L. Markov
imarkov@eecs.umich.edu
University of Michigan, Ann Arbor, MI 48109-2121
September 3, 2014
1 Introduction
Matrix completion is a problem with a broad range of practical applications. Given a matrix with some entries
missing, the goal is to infer the missing values by assuming that they are related to present values by some
structural properties of the matrix. Techniques for solving this problem play critical roles in machine learning
and recommendation systems. Here, we explore the use of such techniques in image processing. As digital
images are naturally represented with matrices of pixel values, existing matrix completion algorithms can be
easily applied. A large class of these algorithms make use of the singular value decomposition (SVD) of a matrix;
in this paper we explore the efficacy of an SVD technique known as the Alternating Least Squares (ALS) algorithm
in reconstructing incomplete images.
It is, of course, impossible to fill in the missing entries of a completely random matrix: the known entries pro-
vide no information about the unknown ones. The project of matrix completion therefore requires the occurrence
of nonrandom patterns. Images are amenable to reconstruction because pixels tend to be highly correlated. The
local neighborhood of a pixel is generally a very good indication of the pixel’s value; patterns may also repeat,
leading to many similar pixels that are spread throughout the image. Features such as the edges of objects are
nonrandom in shape: they tend to be straight lines or smooth curves. Such correlation means that images are
highly redundant: the essential content can be deduced from far fewer pixels. This property of images is the
motivation behind denoising techniques such as the non-local means algorithm [2], which seeks to smooth out
random noise by assigning pixel values based on a weighted average of “similar” pixels. On the more theoretical
side, high local correlation means that image matrices are “low-rank” or nearly so: we can find a linear map
from a lower-dimensional space that approximates that column space of the matrix. The prospects of matrix
completion given the low-rank assumption are considered by Cand´es, Tao [3] and Recht [6], who seek to establish
bounds on how much information is needed to reconstruct a low-rank matrix.
Matrix completion is especially important to businesses like Amazon or Netflix. In the case of Amazon,
it is used to guess which products users might buy, and in the case of Netflix, to guess how users would rate
movies that they have not seen. In this context, matrix completion solely using past user/product ratings is
called collaborative filtering [7, 5]. Collaborative filtering was heavily researched and utilized in the Netflix Prize
Challenge, which tasked constestants with lowering the error score of Netflix’s recommendation algorithm [5, 1].
Many of the developed techniques exploited singular value decomposition, where the matrix in question is factored
in a specific way. The factors are computed based on known data, and then multiplied to produce the complete
matrix of predictions. The alternating least squares algorithm is a means of computing the factors by minimizing
the least squared error between their predictions and the known data. It was one of the most effective single
algorithms used in the Netflix prize challenge [7, 1, 5]. This is the algorithm that we implemented as a means of
image reconstruction. Other variations are also available, such as finding the factors by gradient descent [5]. It
is worth investigating such variations; for our purposes, however, ALS proved effective and easy to implement.
This paper is organized as follows. In Section 2 we briefly discuss the theory behind SVD, and in Section 3
we describe the version of ALS that we implemented. In Section 4 we provide some empirical outcomes of ALS
on real images. In Section 5 we give a more quantitative investigation of the algorithm, and in Section 6 we
compare ALS to other methods. Finally, in Section 7 we draw up conclusions and invite further inquiry.
1
2 Theoretical Background: SVD
The following paraphrases Section 5 of Feuerverger, He, and Khatri [5].
Every real m × n matrix A can be factored as
A = UDV T
(1)
where U is m × m and comprised of orthonormal eigenvectors of AAT
, V is n × n and comprised of orthonormal
eigenvectors of AT
A, and D is a nonnegative m × n diagonal matrix. The diagonal entries of D are called the
“singular values” of A, and the factorization in Equation 1 is called a Singular Value Decomposition of A. We
can use the SVD to approximate A using lower dimensional matrices. For 1 ≤ k ≤ min(m, n), we have
A ≈ U(k)
D(k)
(V (k)
)T
(2)
where U(k)
is m × k and consists of the first k columns of U, V (k)
is n × k and consists of the first k columns of
V , and D(k)
is the upper left k × k block of D. This is called the rank k reconstruction of A and approximates
A in the sense that the square of the Frobenius norm of the difference
||A − U(k)
D(k)
(V (k)
)T
||2
(3)
is minimal. (Unless otherwise stated, || · || will refer to the Frobenius norm).
Suppose now that our m × n matrix A has many missing entries whose values we would like to infer. We
proceed by choosing some k and trying to approximate the rank k reconstruction of A. We start with Equation
2. For our purposes, the middle matrix is unnecessary, as U(k)
D(k)
has the same dimensions as U(k)
. Thus we
can recast the problem as finding an m × k matrix U and an n × k matrix V such that A ≈ UV T
, i.e., the cost
function
||A − UV T
||2
(4)
is minimal. Since A has missing values, we minimize the norm over only the known values. That is, if we let
K be the set of coordinates of the known values of A, and let M(i, j) be the (i, j)th entry of a matrix M, we
compute 4 by
(i,j)∈K
(A(i, j) − (UV T
)(i, j))2
(5)
However, it turns that Equation 4 works poorly in practice: it can be numerically unstable and overfit U and V
to the existing data. To prevent this, we impose a regularization term so that our cost function is now
||A − UV T
||2
+ λ(||U||2
+ ||V ||2
) (6)
where λ is a small positive constant. The question now is how to determine the entries of U and V subject to
this constraint.
3 The ALS Algorithm
The alternating least squares algorithm provides a method of finding the U and V from Equation 6. Our
discussion and implementation of this algorithm in this section is based on Zhou et al. [7] and Feuerverger et al.
[5]. Solving for the two matrices simultaneously is a hard problem, so instead we fix one of them and choose the
other to minimize Equation 6. Then we swap the matrix being fixed with the matrix being solved for and repeat
the process. For example, suppose we randomly initialize U. Then we would minimize V over the cost function,
then fix V and minimize U over the cost function, etc.; this process is repeated until a maximum number of
iterations is reached or a convergence criterion is met.
We introduce some notation. Suppose M is a matrix in which some of the entries are unknown, and N is
another matrix with the same number of rows. Let M(i, :) denote the ith row of M, M(:, i) denote the ith
column of M, and known(M(:, i)) denote a column vector consisting of only the known entries of the ith column
of M. Then N(known(M(:, i))) will be the matrix consisting of only the rows of N corresponding to the rows of
2
known entries of the ith column of M. An example:
M =


1 2 ?
? 5 6
7 ? 9

 N =


1 2
3 4
5 6


M(1, :) = 1 2 ? M(:, 1) =


1
?
7


known(M(:, 1)) =
1
7
N(known(M(:, 1))) =
1 2
5 6
Next, we derive the basis for the ALS algorithm. Given a fixed U, how do we solve for V ? We note that
the square of the Frobenius norm of a matrix is equal to the sum of the squares of the Frobenius norms of its
columns:
||M||2
=
j
||M(:, j)||2
(7)
Also note that each row V (j, :) of V determines the column A(:, j) of A. Thus Equation 6 is equal to
λ||U||2
+
j
||A(:, j) − UV (j, :)T
||2
+ λ||V (j, :)||2
(8)
so we can minimize the cost of each row of V separately given a fixed U. Solving for rows individually is
necessary, as different columns of A possess different locations of known entries. Let C = known(A(:, j)),
Uc = U(known(A(:, j))), and x = V (j, :). Then for each j we are solving C = Ucx subject to minimizing
||C − Ucx||2
+ λ||x||2
. Taking derivatives and removing the resulting factor of 2, we see that this is equivalent
to exactly solving (UT
c Uc)C = (UT
c Uc + λI)x for x. The Matlab mldivide function efficiently yields a solution.
When solving for the entries of U, we observe that AT
= V UT
and proceed analogously.
The following outlines our implementation of ALS, where A is a sparse m × n matrix with missing entries,
and U and V are features matrices of size m × k and n × k respectively (k is chosen to suit the problem, subject
to 1 ≤ k ≤ min(m, n)).
iterations ←− 0 ;
Randomly Initialize U ;
while Not Converged or iterations < max iter do
for 1 ≤ i ≤ n do
C ←− known(A(:, i)) ;
Uc ←− U(known(A(:, i))) ;
Solve the regularized least squares regression C = Ucx for x ;
V (i, :) ←− xT
;
end
for 1 ≤ j ≤ m do
D ←− known(A(j, :)T
) ;
Vd ←− V (known(A(j, :)T
)) ;
Solve the regularized least squares regression D = Vdx for x ;
U(j, :) ←− xT
;
end
iterations ←− iterations + 1;
end
The resulting U and V approximate the rank k reconstruction of A, and the missing entries of A are filled in
using the corresponding entries of UV T
. The rows of U and V are often called “features,” and we will refer in
future references to the rank k reconstruction of an image as the k-feature reconstruction of the image. Many
solutions are possible depending on the random initial values of U, but in practice are usually not so different as
to yield poor results.
3
In our implementation, U was randomly initialized using Matlab’s rand() function, which produces uniformly
distributed numbers in the interval (0, 1). As in [7], we set the first column of U equal to the means of the rows of
A, although we later found that this did not make a noticeable difference on the quality of image reconstruction.
Regarding the convergence criterion, we opted not to use one for our main results, instead only setting a maximum
number of iterations. One possible convergence criterion, as described in [7], stops the algorithm when the root
mean squared error (RMSE) between the predicted and known values decreases by less than some epsilon between
iterations. Other possible criteria could be the convergence of the values of U and V , or the reduction of the
RMSE to below some value. If we have an original image A of size m × n and its reconstruction B = UV T
from
some level of sparsity, the RMSE is calculated as
RMSE =
√
MSE =
1
mn
m
i=1
n
j=1
(A(i, j) − B(i, j))2 (9)
If the image has R, G, and B components, the formula is easily extended by averaging the mean squared error
(MSE) of the three and then taking the square root.
4 Preliminary Investigation
We applied the ALS algorithm to reconstruct two different images after some percentage of their pixels had been
randomly deleted. We used Matlab R2013a on an HP Compaq Elite 8300 PC running Microsoft Windows 7
Enterprise, Version 6.1.7601, with 16 GB of RAM and an Intel Core i7-3770 CPU.
On each sparse image we ran the algorithm with a regularization constant of λ = 0.05 for 30 iterations.
Figure 1 shows the results of an application of the algorithm to a black and white image of a beach. Figure 2 shows
the results applied to a color image of boats at a dock, with the R, G, and B matrices processed separately (note:
for the sparse color images, R, G, and B values were deleted independently). Each image was made 40% sparse
and then reconstructed from that state using 25 and 100 features respectively. Then each image was then made
70% sparse and subsequently reconstructed using 25 features. These quantities of features were chosen somewhat
arbitrarily, the only criterion being that they yield reasonably good-looking results. The reconstructions from
the 40% sparse images are quite good. The 100-feature reconstruction in each case produced a sharper image,
but with more outliers; the 25 feature reconstruction produced a blurrier image with fewer outlying pixels. The
reconstructions from 70% sparse images are understandably fuzzy, but still quite good considering how much of
the image was missing.
We also investigated how well the algorithm could reconstruct a 90% sparse version of the beach image. The
results of this are shown in Figure 3. The 90% sparse image was reconstructed using 10, 50, and 500 features. The
10 feature and 500 feature reconstructions are similar in quality, whereas the 50 feature reconstruction produced
unintellible results. This reduction in quality associated with medium-sized feature counts was also observed in
the 70% sparse image, which was better reconstructed with 25 features than with 50 features, although the effect
was less pronounced. It would seem that high and low feature counts tend to produce results of better quality,
whereas in-between feature quantities produce worse results.
The runtimes for all the reconstructed images are shown in Table 1. Figures 1 and 3 contain images of
dimension 500×750×1, while Figure 2 contains images of dimension 427×640×3. Factors affecting the runtime
include image size, image sparsity, the number of features, and the maximum number of iterations. Runtimes
may also vary slightly due to different pixels being randomly deleted when the image is made sparse; we found
that this usually changes the runtimes by a few hundredths of a second between otherwise identical trials. The
runtimes for Figure 2 are for the color image as a whole, which means that they include the processing of the R,
G, and B matrices together. Dividing each runtime by 3 gives the approximate runtime of each color component
individually.
Table 1 also shows the RMSE and the peak signal-to-noise ratio (PSNR) of each image. The PSNR measures
the ratio of the maximum possible noise to the existing noise, and is calculated using Equation 10 [4]:
PSNR = 20 log10
255
RMSE
(10)
The number 255 in the equation is the maximum possible value of each pixel, and hence the maximum possible
noise. The lower the RMSE and the higher the PSNR, the better the quality of reconstruction. By this criterion,
4
Table 1: ALS Runtimes and Error Scores (Figures 1, 2, and 3)
Beach Image (500×750×1) Time (sec) RMSE PSNR (dB)
40% Sparse 25 Features 3.0029 8.0872 29.9749
40% Sparse 100 Features 13.4765 9.6147 28.4721
70% Sparse 25 Features 1.9152 13.0060 25.8479
90% Sparse 10 Features 1.1134 27.1654 19.4505
90% Sparse 50 Features 3.3784 54.7027 13.3706
90% Sparse 500 Features 169.9606 28.3323 19.0852
Boat Image (427×640×3) Time (sec) RMSE PSNR (dB)
40% Sparse 25 Features 6.9534 24.2969 20.4198
40% Sparse 100 Features 26.2246 24.2969 20.4198
70% Sparse 25 Features 4.4849 33.5939 17.6056
the 40% sparse beach image reconstructed with 25 features is the best reconstruction of all the images, just
slightly ahead of the 100 feature reconstruction. Visually this appears reasonable, although one could say that
the 100 feature reconstruction looks better overall because it has sharper edges (PSNR is only an approximate
measure of the effective quality to the human eye). The values also confirm the dip in quality in the middle of the
three 90% sparse feature counts. The 10 and 500 feature reconstructions have SNR values around 19, whereas
the 50 feature reconstruction has an PSNR close to 13, indicating that it is far inferior in quality; indeed, we
observe this to be the case. One thing that we must explain is why the PSNR values of the colored images in
Figure 2 are significantly lower than those of the black and white images of Figure 1 with the same sparsity and
feature counts, yet the visual quality is almost the same. We would attribute the lower values to the fact that
by processing the R, G, and B values independently, the accuracy for each one decreases because not all of the
information in the image is taken into account. However, we conjecture that the quality is similar because errors
in the R, G, and B values for a pixel cause less of an apparent change in color than errors in a black-and-white
image. The result is that while the error rate is higher, the image is more robust against change to the human
eye.
We explored some small modifications to the algorithm, none of which produced dramatic changes. We found
that running the ALS algorithm for more than about 30 iterations conferred negligible additional quality (using
the RMSE convergence criterion, the algorithm often terminated and produced good results in fewer than 10
iterations). We also tried statistically normalizing the pixel matrices by subtracting off the mean and dividing
by the standard deviation before sending them through the algorithm. This only produced slight visual changes
in cases of more extreme sparsity, such as 70% and above.
All of these results, of course, are based on the two images that we were using, and what we visually judged
to be a good quality reconstruction. We see no reason that the tendencies found should not extend to other
images in general, although caution should be taken, as different image types will likely be more amenable to
reconstruction in this way than others. In particular, we would conjecture that this technique works best when
the main objects shown in the image take up many contiguous pixels, so as to provide a scaffolding around which
to be reconstructed; on the other hand, images with many fine details that are one or two pixels in size will not be
reconstructed well. The removed pixels must also be evenly distributed, for no simple algorithm can reasonably
fill in an image missing one large connected chunk, as the information is simply lost.
5 Quantitative Investigation
After the initial investigation, we performed a more thorough analysis of the optimal feature counts. First, we
ran the ALS algorithm at five different levels of sparsity, graphing rmse score vs feature count for each level.
The results are shown in Figure 4. We sampled features counts up to 500, more widely spacing out the samples
after 100 features due to the long runtimes of those trials. The graphs display some clear trends. First, in each
example the minimum rmse score occurs below 100 features. Second, the first four graphs appear quadratic
around the minimum, and this is illustrated on the graphs with quadratic curves fit to the portion of the data
that appear quadratic (see Table 2 for the exact equations). Third, as we found in the preliminary investigation,
rmse sharply spikes at feature counts in the mid-hundreds, and subsequently dips and levels off at a value slightly
above the minimum. Fourth, this pattern shifts to the left as sparsity increases, until the quadratic pattern has
5
Table 2: Equations of quadratic curves in Figure 4
Graph Equation Number of points fit
20% Sparse 0.0003396x2
− 0.07051x + 7.521 12
40% Sparse 0.001635x2
− 0.1884x + 12.31 10
50% Sparse 0.002312x2
− 0.2068x + 13.67 10
70% Sparse 0.01013x2
− 0.4996x + 20.02 12
Table 3: Equations of best fit lines in Figure 5
Graph Equation Number of points fit
Beach −1.22x + 107.9 18
Boats −1.171x + 104 all
almost entirely shifted off the left of the graph in the 90% sparse image.
Based on this evidence, there appears to be no advantage to using more than 100 features, except in extreme
cases where sparsity is exceptionally low (<20%) or exceptionally high (>90%). We use this insight in in Figure
5, where we graph the rmse-minimizing (optimal) feature count versus the sparsity level for the beach image and
the boat image, restricting ourselves to 100 features or fewer. For the well-behaved, mid-level-sparse data in the
beach graph, optimal feature count decreases approximately linearly as sparsity increases. This is illustrated by
a a best-fit line which excludes the outlying data point in the upper right (see Table 3 for the equation). For
the less well-behaved sparsity levels, optimal feature count for the beach image exceeds 100 when sparsity is
either less than 20 or greater than 90. The graph for the boat image shows a similar decreasing trend that is
approximately linear, although there are fewer data points due to longer runtimes. The lines in both graphs are
almost the same, which is evidence that different images have similar optimal feature counts at similar sparsity
levels. This suggests that it is possible generate an optimal feature count graph for a diverse set of images, and
then use the resulting trendline as a guide to determining the best feature count to use on new images.
6 Comparison to Other Methods
We now put the SVD method in context by comparing it to other methods of image reconstruction. The simplest
such method is naive reconstruction - fill in missing pixels by averaging the values of the k nearest known pixels.
The results of this are shown in Figure 6. We made the beach image 40% and 70% sparse, and filled in the
missing pixels by averaging the 16 nearest existing pixels. Table 4 gives the runtimes and error scores for Figure
6. Comparing the values to Table 1, we see that at the 40% sparsity level, the nearest neighbor algorithm runs
faster - 2.9034 seconds as opposed to 3.0029 seconds - and produces a lower RMSE score, 6.9898 versus 8.0872
produced by ALS, indicating that the reconstruction erred less, on average, than the alternating least squares
algorithm reconstruction. On the other hand, at the 70% sparse level, the ALS algorithm finished in 1.9152
seconds, less than half the time of the nearest neighbors algorithm, which finished in 4.6869 seconds. Nearest
neighbors still produced a lower error score, 10.9828, versus 13.0060 produced by ALS.
What accounts for these differences? The first thing to explain is runtime. At a fixed number of features,
increasing sparsity will decrease the runtime of ALS, because it will then be operating on smaller matrices. On
the other hand, increasing sparsity will increase the runtime of the nearest neighbors algorithm, because each
unknown pixel must have its surroundings searched for neighbors. Next we account for error score. Naive nearest
neighbors seems to do better than ALS at all levels of sparsity. We would conjecture that this is because nearest
neighbors exploits the high local correlation of pixels in an image more than SVD-based methods, which assign
“features” to rows and columns, so that the dot product of the features associated with row x and column y
yields pixel (x, y).
Table 4: Nearest Neighbor Runtimes and Error Scores for Figure 6
Beach Image (500×750×1) Time (sec) RMSE PSNR (dB)
40% Sparse 16 nearest pixels 2.9034 6.9898 31.2415
70% 16 nearest pixels 4.6869 10.9828 27.3165
6
7 Conclusion and Further Questions
A main conclusion to draw is that even a simple SVD technique can produce quality image reconstructions, and
that this quality is robust to minor variations in the algorithm. We would attribute this to the fact that as
long as an image is approximately correct, the human brain can easily make sense out of it. A major topic for
further study, based on our results, is how to choose the optimal number of features for reconstructing an image
with SVD. Low numbers of features and high numbers of features both seemed to produce better images than
in-between quantities. Related but less impactful is the question of how to choose the regularization constant
optimally; we chose λ = 0.05 based on what worked in [7].
Another main conclusion is that SVD, by itself, is not an optimal method of image reconstruction - the nearest
neighbors reconstructions beat the error scores of the SVD reconstructions. However, there is a trade-off between
runtime and accuracy. In our trials, SVD ran in much less time on highly sparse images than nearest neighbors,
whereas for denser images, nearest neighbors won. This suggests using SVD for extremely sparse images, and
nearest neighbors for everything else. A promising area of research lies in combining the two methods - perhaps
nearest neighbors, or an algorithm like nonlocal means [2], could be used to smooth out an SVD reconstruction.
Moreover, as real-world images often have some level of noise, the relationship between inferring unknown pixels
and denoising erroneous pixels could be explored. For example: is it better to first reconstruct the missing pixels,
and then denoise, or to denoise the known pixels, and then reconstruct the image?
Other avenues for investigation abound. For example, if we can choose how to delete the pixels, what is the
optimal way to do so in order to preserve the most information? Given the choice of deleted pixels, it might
also be possible to significantly enhance the speed of ALS by solving for many features at once (this was not
possible in our implemenation, as the missing pixels were random, forcing each row of features to be considered
independently). Another way to enhance performance could be to break up large images into small subimages
and apply ALS to them individually. Finally, the success of other SVD-based algorithms could be investigated,
as SVD has proven fruitful in the field of image processing.
References
[1] Robert M. Bell and Yehuda Koren. Lessons from the netflix prize challenge. SIGKDD Explorations, 9(2):75–
79, 2007.
[2] Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. A non-local algorithm for image denoising. Computer
Vision and Pattern Recognition, 2:60–65, 2005.
[3] Emmanuel J. Cand`es and Terence Tao. The power of convex relaxation: Near-optimal matrix completion.
arXiv 0903.1476v1, 2009.
[4] National Instruments Corporation. Peak signal-to-noise ratio as an image quality metric.
http://www.ni.com/white-paper/13306/en/, 2013.
[5] Andrey Feuerverger, Yu He, and Shashi Khatri. Statistical significance of the netflix challenge. Statistical
Science, 27(2):202–231, 2012.
[6] Benjamin Recht. A simpler approach to matrix completion. arXiv 0910.0651v2, 2009.
[7] Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. Large-scale parallel collaborative filtering
for the netflix prize challenge. Lecture Notes in Computer Science, 5034:337–348, 2008.
7
Original Reconstructed from 40% using 25 features
40% sparse Reconstructed from 40% using 100 features
70% sparse Reconstructed from 70% using 25 features
Figure 1: A black-and-white image of a beach (size 500×750×1), reconstructed using the ALS algorithm from
two different levels of sparsity.
8
Original Reconstructed from 40% using 25 features
40% sparse Reconstructed from 40% using 100 features
70% sparse Reconstructed from 70% using 25 features
Figure 2: A color image of boats at a dock (size 427×640×3), reconstructed using ALS from two different levels
of sparsity.
9
90% sparse Reconstructed using 500 features
Reconstructed using 10 features Reconstructed using 50 features
Figure 3: The beach image from Figure 1 reconstructed three different ways from 90% sparsity
10
Figure 4: Beach Image RMSE vs Features at Fixed Sparsity Levels
20% Sparse 40% Sparse
0 100 200 300 400 500
0
4
8
12
16
20
24
Features
RMSE
0 100 200 300 400 500
0
5
10
15
20
25
30
35
40
Features
RMSE
50% Sparse 70% Sparse
0 100 200 300 400 500
0
5
10
15
20
25
30
35
40
45
50
55
Features
RMSE
0 100 200 300 400 500
0
5
10
15
20
25
30
35
40
45
50
55
60
Features
RMSE
90% Sparse
0 100 200 300 400 500
20
30
40
50
60
70
80
Number of Features
RMSE
11
Figure 5: Optimal Feature Count vs Sparsity, to the Nearest 5 Features
Beach
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
Sparsity
RMSE−MinimizingFeatureCount
Boats
0 10 20 30 40 50 60 70 80 90
0
10
20
30
40
50
60
70
80
90
100
Sparsity
RMSE−MinimizingFeatureCount
12
40% sparse Averaging 16 Nearest Pixels
70% sparse Averaging 16 nearest pixels
Figure 6: The beach image from Figure 1 reconstructed using a naive nearest neighbor algorithm
13

Weitere ähnliche Inhalte

Was ist angesagt?

On observer design methods for a
On observer design methods for aOn observer design methods for a
On observer design methods for acsandit
 
Direct Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems OfDirect Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems OfMarcela Carrillo
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis Ibrahim Amer
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Alramiljayureta
 
System of linear algebriac equations nsm
System of linear algebriac equations nsmSystem of linear algebriac equations nsm
System of linear algebriac equations nsmRahul Narang
 
Iterativos Methods
Iterativos MethodsIterativos Methods
Iterativos MethodsJeannie
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSnaveen kumar
 
PCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and ToolkitsPCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and ToolkitsHopeBay Technologies, Inc.
 
Numerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operatorNumerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operatorIJECEIAES
 
Crout s method for solving system of linear equations
Crout s method for solving system of linear equationsCrout s method for solving system of linear equations
Crout s method for solving system of linear equationsSugathan Velloth
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...TELKOMNIKA JOURNAL
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysisVanessa S
 

Was ist angesagt? (19)

On observer design methods for a
On observer design methods for aOn observer design methods for a
On observer design methods for a
 
Direct Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems OfDirect Methods For The Solution Of Systems Of
Direct Methods For The Solution Of Systems Of
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Al
 
System of linear algebriac equations nsm
System of linear algebriac equations nsmSystem of linear algebriac equations nsm
System of linear algebriac equations nsm
 
Iterativos Methods
Iterativos MethodsIterativos Methods
Iterativos Methods
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
 
Ou3425912596
Ou3425912596Ou3425912596
Ou3425912596
 
PCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and ToolkitsPCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and Toolkits
 
Numerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operatorNumerical approach of riemann-liouville fractional derivative operator
Numerical approach of riemann-liouville fractional derivative operator
 
Coueete project
Coueete projectCoueete project
Coueete project
 
Crout s method for solving system of linear equations
Crout s method for solving system of linear equationsCrout s method for solving system of linear equations
Crout s method for solving system of linear equations
 
Statistical Physics Assignment Help
Statistical Physics Assignment HelpStatistical Physics Assignment Help
Statistical Physics Assignment Help
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 

Ähnlich wie directed-research-report

20070823
2007082320070823
20070823neostar
 
Setting linear algebra problems
Setting linear algebra problemsSetting linear algebra problems
Setting linear algebra problemsJB Online
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...ijdpsjournal
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equationsXequeMateShannon
 
tw1979 Exercise 1 Report
tw1979 Exercise 1 Reporttw1979 Exercise 1 Report
tw1979 Exercise 1 ReportThomas Wigg
 
A Condensation-Projection Method For The Generalized Eigenvalue Problem
A Condensation-Projection Method For The Generalized Eigenvalue ProblemA Condensation-Projection Method For The Generalized Eigenvalue Problem
A Condensation-Projection Method For The Generalized Eigenvalue ProblemScott Donald
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...ANIRBANMAJUMDAR18
 
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTION
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTIONMODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTION
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTIONijwmn
 
Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image ijcsa
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...SSA KPI
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportShikhar Agarwal
 

Ähnlich wie directed-research-report (20)

20070823
2007082320070823
20070823
 
Setting linear algebra problems
Setting linear algebra problemsSetting linear algebra problems
Setting linear algebra problems
 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
 
Quantum algorithm for solving linear systems of equations
 Quantum algorithm for solving linear systems of equations Quantum algorithm for solving linear systems of equations
Quantum algorithm for solving linear systems of equations
 
Chapter26
Chapter26Chapter26
Chapter26
 
N41049093
N41049093N41049093
N41049093
 
tw1979 Exercise 1 Report
tw1979 Exercise 1 Reporttw1979 Exercise 1 Report
tw1979 Exercise 1 Report
 
A Condensation-Projection Method For The Generalized Eigenvalue Problem
A Condensation-Projection Method For The Generalized Eigenvalue ProblemA Condensation-Projection Method For The Generalized Eigenvalue Problem
A Condensation-Projection Method For The Generalized Eigenvalue Problem
 
Ijetr021210
Ijetr021210Ijetr021210
Ijetr021210
 
Ijetr021210
Ijetr021210Ijetr021210
Ijetr021210
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTION
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTIONMODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTION
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTION
 
EE8120_Projecte_15
EE8120_Projecte_15EE8120_Projecte_15
EE8120_Projecte_15
 
Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image Regularized Compression of A Noisy Blurred Image
Regularized Compression of A Noisy Blurred Image
 
D034017022
D034017022D034017022
D034017022
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Joint3DShapeMatching
Joint3DShapeMatchingJoint3DShapeMatching
Joint3DShapeMatching
 
overviewPCA
overviewPCAoverviewPCA
overviewPCA
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project Report
 

directed-research-report

  • 1. Directed Research Report: Reconstructing Images Using Singular Value Decomposition Ryen Krusinga rkruser@umich.edu Igor L. Markov imarkov@eecs.umich.edu University of Michigan, Ann Arbor, MI 48109-2121 September 3, 2014 1 Introduction Matrix completion is a problem with a broad range of practical applications. Given a matrix with some entries missing, the goal is to infer the missing values by assuming that they are related to present values by some structural properties of the matrix. Techniques for solving this problem play critical roles in machine learning and recommendation systems. Here, we explore the use of such techniques in image processing. As digital images are naturally represented with matrices of pixel values, existing matrix completion algorithms can be easily applied. A large class of these algorithms make use of the singular value decomposition (SVD) of a matrix; in this paper we explore the efficacy of an SVD technique known as the Alternating Least Squares (ALS) algorithm in reconstructing incomplete images. It is, of course, impossible to fill in the missing entries of a completely random matrix: the known entries pro- vide no information about the unknown ones. The project of matrix completion therefore requires the occurrence of nonrandom patterns. Images are amenable to reconstruction because pixels tend to be highly correlated. The local neighborhood of a pixel is generally a very good indication of the pixel’s value; patterns may also repeat, leading to many similar pixels that are spread throughout the image. Features such as the edges of objects are nonrandom in shape: they tend to be straight lines or smooth curves. Such correlation means that images are highly redundant: the essential content can be deduced from far fewer pixels. This property of images is the motivation behind denoising techniques such as the non-local means algorithm [2], which seeks to smooth out random noise by assigning pixel values based on a weighted average of “similar” pixels. On the more theoretical side, high local correlation means that image matrices are “low-rank” or nearly so: we can find a linear map from a lower-dimensional space that approximates that column space of the matrix. The prospects of matrix completion given the low-rank assumption are considered by Cand´es, Tao [3] and Recht [6], who seek to establish bounds on how much information is needed to reconstruct a low-rank matrix. Matrix completion is especially important to businesses like Amazon or Netflix. In the case of Amazon, it is used to guess which products users might buy, and in the case of Netflix, to guess how users would rate movies that they have not seen. In this context, matrix completion solely using past user/product ratings is called collaborative filtering [7, 5]. Collaborative filtering was heavily researched and utilized in the Netflix Prize Challenge, which tasked constestants with lowering the error score of Netflix’s recommendation algorithm [5, 1]. Many of the developed techniques exploited singular value decomposition, where the matrix in question is factored in a specific way. The factors are computed based on known data, and then multiplied to produce the complete matrix of predictions. The alternating least squares algorithm is a means of computing the factors by minimizing the least squared error between their predictions and the known data. It was one of the most effective single algorithms used in the Netflix prize challenge [7, 1, 5]. This is the algorithm that we implemented as a means of image reconstruction. Other variations are also available, such as finding the factors by gradient descent [5]. It is worth investigating such variations; for our purposes, however, ALS proved effective and easy to implement. This paper is organized as follows. In Section 2 we briefly discuss the theory behind SVD, and in Section 3 we describe the version of ALS that we implemented. In Section 4 we provide some empirical outcomes of ALS on real images. In Section 5 we give a more quantitative investigation of the algorithm, and in Section 6 we compare ALS to other methods. Finally, in Section 7 we draw up conclusions and invite further inquiry. 1
  • 2. 2 Theoretical Background: SVD The following paraphrases Section 5 of Feuerverger, He, and Khatri [5]. Every real m × n matrix A can be factored as A = UDV T (1) where U is m × m and comprised of orthonormal eigenvectors of AAT , V is n × n and comprised of orthonormal eigenvectors of AT A, and D is a nonnegative m × n diagonal matrix. The diagonal entries of D are called the “singular values” of A, and the factorization in Equation 1 is called a Singular Value Decomposition of A. We can use the SVD to approximate A using lower dimensional matrices. For 1 ≤ k ≤ min(m, n), we have A ≈ U(k) D(k) (V (k) )T (2) where U(k) is m × k and consists of the first k columns of U, V (k) is n × k and consists of the first k columns of V , and D(k) is the upper left k × k block of D. This is called the rank k reconstruction of A and approximates A in the sense that the square of the Frobenius norm of the difference ||A − U(k) D(k) (V (k) )T ||2 (3) is minimal. (Unless otherwise stated, || · || will refer to the Frobenius norm). Suppose now that our m × n matrix A has many missing entries whose values we would like to infer. We proceed by choosing some k and trying to approximate the rank k reconstruction of A. We start with Equation 2. For our purposes, the middle matrix is unnecessary, as U(k) D(k) has the same dimensions as U(k) . Thus we can recast the problem as finding an m × k matrix U and an n × k matrix V such that A ≈ UV T , i.e., the cost function ||A − UV T ||2 (4) is minimal. Since A has missing values, we minimize the norm over only the known values. That is, if we let K be the set of coordinates of the known values of A, and let M(i, j) be the (i, j)th entry of a matrix M, we compute 4 by (i,j)∈K (A(i, j) − (UV T )(i, j))2 (5) However, it turns that Equation 4 works poorly in practice: it can be numerically unstable and overfit U and V to the existing data. To prevent this, we impose a regularization term so that our cost function is now ||A − UV T ||2 + λ(||U||2 + ||V ||2 ) (6) where λ is a small positive constant. The question now is how to determine the entries of U and V subject to this constraint. 3 The ALS Algorithm The alternating least squares algorithm provides a method of finding the U and V from Equation 6. Our discussion and implementation of this algorithm in this section is based on Zhou et al. [7] and Feuerverger et al. [5]. Solving for the two matrices simultaneously is a hard problem, so instead we fix one of them and choose the other to minimize Equation 6. Then we swap the matrix being fixed with the matrix being solved for and repeat the process. For example, suppose we randomly initialize U. Then we would minimize V over the cost function, then fix V and minimize U over the cost function, etc.; this process is repeated until a maximum number of iterations is reached or a convergence criterion is met. We introduce some notation. Suppose M is a matrix in which some of the entries are unknown, and N is another matrix with the same number of rows. Let M(i, :) denote the ith row of M, M(:, i) denote the ith column of M, and known(M(:, i)) denote a column vector consisting of only the known entries of the ith column of M. Then N(known(M(:, i))) will be the matrix consisting of only the rows of N corresponding to the rows of 2
  • 3. known entries of the ith column of M. An example: M =   1 2 ? ? 5 6 7 ? 9   N =   1 2 3 4 5 6   M(1, :) = 1 2 ? M(:, 1) =   1 ? 7   known(M(:, 1)) = 1 7 N(known(M(:, 1))) = 1 2 5 6 Next, we derive the basis for the ALS algorithm. Given a fixed U, how do we solve for V ? We note that the square of the Frobenius norm of a matrix is equal to the sum of the squares of the Frobenius norms of its columns: ||M||2 = j ||M(:, j)||2 (7) Also note that each row V (j, :) of V determines the column A(:, j) of A. Thus Equation 6 is equal to λ||U||2 + j ||A(:, j) − UV (j, :)T ||2 + λ||V (j, :)||2 (8) so we can minimize the cost of each row of V separately given a fixed U. Solving for rows individually is necessary, as different columns of A possess different locations of known entries. Let C = known(A(:, j)), Uc = U(known(A(:, j))), and x = V (j, :). Then for each j we are solving C = Ucx subject to minimizing ||C − Ucx||2 + λ||x||2 . Taking derivatives and removing the resulting factor of 2, we see that this is equivalent to exactly solving (UT c Uc)C = (UT c Uc + λI)x for x. The Matlab mldivide function efficiently yields a solution. When solving for the entries of U, we observe that AT = V UT and proceed analogously. The following outlines our implementation of ALS, where A is a sparse m × n matrix with missing entries, and U and V are features matrices of size m × k and n × k respectively (k is chosen to suit the problem, subject to 1 ≤ k ≤ min(m, n)). iterations ←− 0 ; Randomly Initialize U ; while Not Converged or iterations < max iter do for 1 ≤ i ≤ n do C ←− known(A(:, i)) ; Uc ←− U(known(A(:, i))) ; Solve the regularized least squares regression C = Ucx for x ; V (i, :) ←− xT ; end for 1 ≤ j ≤ m do D ←− known(A(j, :)T ) ; Vd ←− V (known(A(j, :)T )) ; Solve the regularized least squares regression D = Vdx for x ; U(j, :) ←− xT ; end iterations ←− iterations + 1; end The resulting U and V approximate the rank k reconstruction of A, and the missing entries of A are filled in using the corresponding entries of UV T . The rows of U and V are often called “features,” and we will refer in future references to the rank k reconstruction of an image as the k-feature reconstruction of the image. Many solutions are possible depending on the random initial values of U, but in practice are usually not so different as to yield poor results. 3
  • 4. In our implementation, U was randomly initialized using Matlab’s rand() function, which produces uniformly distributed numbers in the interval (0, 1). As in [7], we set the first column of U equal to the means of the rows of A, although we later found that this did not make a noticeable difference on the quality of image reconstruction. Regarding the convergence criterion, we opted not to use one for our main results, instead only setting a maximum number of iterations. One possible convergence criterion, as described in [7], stops the algorithm when the root mean squared error (RMSE) between the predicted and known values decreases by less than some epsilon between iterations. Other possible criteria could be the convergence of the values of U and V , or the reduction of the RMSE to below some value. If we have an original image A of size m × n and its reconstruction B = UV T from some level of sparsity, the RMSE is calculated as RMSE = √ MSE = 1 mn m i=1 n j=1 (A(i, j) − B(i, j))2 (9) If the image has R, G, and B components, the formula is easily extended by averaging the mean squared error (MSE) of the three and then taking the square root. 4 Preliminary Investigation We applied the ALS algorithm to reconstruct two different images after some percentage of their pixels had been randomly deleted. We used Matlab R2013a on an HP Compaq Elite 8300 PC running Microsoft Windows 7 Enterprise, Version 6.1.7601, with 16 GB of RAM and an Intel Core i7-3770 CPU. On each sparse image we ran the algorithm with a regularization constant of λ = 0.05 for 30 iterations. Figure 1 shows the results of an application of the algorithm to a black and white image of a beach. Figure 2 shows the results applied to a color image of boats at a dock, with the R, G, and B matrices processed separately (note: for the sparse color images, R, G, and B values were deleted independently). Each image was made 40% sparse and then reconstructed from that state using 25 and 100 features respectively. Then each image was then made 70% sparse and subsequently reconstructed using 25 features. These quantities of features were chosen somewhat arbitrarily, the only criterion being that they yield reasonably good-looking results. The reconstructions from the 40% sparse images are quite good. The 100-feature reconstruction in each case produced a sharper image, but with more outliers; the 25 feature reconstruction produced a blurrier image with fewer outlying pixels. The reconstructions from 70% sparse images are understandably fuzzy, but still quite good considering how much of the image was missing. We also investigated how well the algorithm could reconstruct a 90% sparse version of the beach image. The results of this are shown in Figure 3. The 90% sparse image was reconstructed using 10, 50, and 500 features. The 10 feature and 500 feature reconstructions are similar in quality, whereas the 50 feature reconstruction produced unintellible results. This reduction in quality associated with medium-sized feature counts was also observed in the 70% sparse image, which was better reconstructed with 25 features than with 50 features, although the effect was less pronounced. It would seem that high and low feature counts tend to produce results of better quality, whereas in-between feature quantities produce worse results. The runtimes for all the reconstructed images are shown in Table 1. Figures 1 and 3 contain images of dimension 500×750×1, while Figure 2 contains images of dimension 427×640×3. Factors affecting the runtime include image size, image sparsity, the number of features, and the maximum number of iterations. Runtimes may also vary slightly due to different pixels being randomly deleted when the image is made sparse; we found that this usually changes the runtimes by a few hundredths of a second between otherwise identical trials. The runtimes for Figure 2 are for the color image as a whole, which means that they include the processing of the R, G, and B matrices together. Dividing each runtime by 3 gives the approximate runtime of each color component individually. Table 1 also shows the RMSE and the peak signal-to-noise ratio (PSNR) of each image. The PSNR measures the ratio of the maximum possible noise to the existing noise, and is calculated using Equation 10 [4]: PSNR = 20 log10 255 RMSE (10) The number 255 in the equation is the maximum possible value of each pixel, and hence the maximum possible noise. The lower the RMSE and the higher the PSNR, the better the quality of reconstruction. By this criterion, 4
  • 5. Table 1: ALS Runtimes and Error Scores (Figures 1, 2, and 3) Beach Image (500×750×1) Time (sec) RMSE PSNR (dB) 40% Sparse 25 Features 3.0029 8.0872 29.9749 40% Sparse 100 Features 13.4765 9.6147 28.4721 70% Sparse 25 Features 1.9152 13.0060 25.8479 90% Sparse 10 Features 1.1134 27.1654 19.4505 90% Sparse 50 Features 3.3784 54.7027 13.3706 90% Sparse 500 Features 169.9606 28.3323 19.0852 Boat Image (427×640×3) Time (sec) RMSE PSNR (dB) 40% Sparse 25 Features 6.9534 24.2969 20.4198 40% Sparse 100 Features 26.2246 24.2969 20.4198 70% Sparse 25 Features 4.4849 33.5939 17.6056 the 40% sparse beach image reconstructed with 25 features is the best reconstruction of all the images, just slightly ahead of the 100 feature reconstruction. Visually this appears reasonable, although one could say that the 100 feature reconstruction looks better overall because it has sharper edges (PSNR is only an approximate measure of the effective quality to the human eye). The values also confirm the dip in quality in the middle of the three 90% sparse feature counts. The 10 and 500 feature reconstructions have SNR values around 19, whereas the 50 feature reconstruction has an PSNR close to 13, indicating that it is far inferior in quality; indeed, we observe this to be the case. One thing that we must explain is why the PSNR values of the colored images in Figure 2 are significantly lower than those of the black and white images of Figure 1 with the same sparsity and feature counts, yet the visual quality is almost the same. We would attribute the lower values to the fact that by processing the R, G, and B values independently, the accuracy for each one decreases because not all of the information in the image is taken into account. However, we conjecture that the quality is similar because errors in the R, G, and B values for a pixel cause less of an apparent change in color than errors in a black-and-white image. The result is that while the error rate is higher, the image is more robust against change to the human eye. We explored some small modifications to the algorithm, none of which produced dramatic changes. We found that running the ALS algorithm for more than about 30 iterations conferred negligible additional quality (using the RMSE convergence criterion, the algorithm often terminated and produced good results in fewer than 10 iterations). We also tried statistically normalizing the pixel matrices by subtracting off the mean and dividing by the standard deviation before sending them through the algorithm. This only produced slight visual changes in cases of more extreme sparsity, such as 70% and above. All of these results, of course, are based on the two images that we were using, and what we visually judged to be a good quality reconstruction. We see no reason that the tendencies found should not extend to other images in general, although caution should be taken, as different image types will likely be more amenable to reconstruction in this way than others. In particular, we would conjecture that this technique works best when the main objects shown in the image take up many contiguous pixels, so as to provide a scaffolding around which to be reconstructed; on the other hand, images with many fine details that are one or two pixels in size will not be reconstructed well. The removed pixels must also be evenly distributed, for no simple algorithm can reasonably fill in an image missing one large connected chunk, as the information is simply lost. 5 Quantitative Investigation After the initial investigation, we performed a more thorough analysis of the optimal feature counts. First, we ran the ALS algorithm at five different levels of sparsity, graphing rmse score vs feature count for each level. The results are shown in Figure 4. We sampled features counts up to 500, more widely spacing out the samples after 100 features due to the long runtimes of those trials. The graphs display some clear trends. First, in each example the minimum rmse score occurs below 100 features. Second, the first four graphs appear quadratic around the minimum, and this is illustrated on the graphs with quadratic curves fit to the portion of the data that appear quadratic (see Table 2 for the exact equations). Third, as we found in the preliminary investigation, rmse sharply spikes at feature counts in the mid-hundreds, and subsequently dips and levels off at a value slightly above the minimum. Fourth, this pattern shifts to the left as sparsity increases, until the quadratic pattern has 5
  • 6. Table 2: Equations of quadratic curves in Figure 4 Graph Equation Number of points fit 20% Sparse 0.0003396x2 − 0.07051x + 7.521 12 40% Sparse 0.001635x2 − 0.1884x + 12.31 10 50% Sparse 0.002312x2 − 0.2068x + 13.67 10 70% Sparse 0.01013x2 − 0.4996x + 20.02 12 Table 3: Equations of best fit lines in Figure 5 Graph Equation Number of points fit Beach −1.22x + 107.9 18 Boats −1.171x + 104 all almost entirely shifted off the left of the graph in the 90% sparse image. Based on this evidence, there appears to be no advantage to using more than 100 features, except in extreme cases where sparsity is exceptionally low (<20%) or exceptionally high (>90%). We use this insight in in Figure 5, where we graph the rmse-minimizing (optimal) feature count versus the sparsity level for the beach image and the boat image, restricting ourselves to 100 features or fewer. For the well-behaved, mid-level-sparse data in the beach graph, optimal feature count decreases approximately linearly as sparsity increases. This is illustrated by a a best-fit line which excludes the outlying data point in the upper right (see Table 3 for the equation). For the less well-behaved sparsity levels, optimal feature count for the beach image exceeds 100 when sparsity is either less than 20 or greater than 90. The graph for the boat image shows a similar decreasing trend that is approximately linear, although there are fewer data points due to longer runtimes. The lines in both graphs are almost the same, which is evidence that different images have similar optimal feature counts at similar sparsity levels. This suggests that it is possible generate an optimal feature count graph for a diverse set of images, and then use the resulting trendline as a guide to determining the best feature count to use on new images. 6 Comparison to Other Methods We now put the SVD method in context by comparing it to other methods of image reconstruction. The simplest such method is naive reconstruction - fill in missing pixels by averaging the values of the k nearest known pixels. The results of this are shown in Figure 6. We made the beach image 40% and 70% sparse, and filled in the missing pixels by averaging the 16 nearest existing pixels. Table 4 gives the runtimes and error scores for Figure 6. Comparing the values to Table 1, we see that at the 40% sparsity level, the nearest neighbor algorithm runs faster - 2.9034 seconds as opposed to 3.0029 seconds - and produces a lower RMSE score, 6.9898 versus 8.0872 produced by ALS, indicating that the reconstruction erred less, on average, than the alternating least squares algorithm reconstruction. On the other hand, at the 70% sparse level, the ALS algorithm finished in 1.9152 seconds, less than half the time of the nearest neighbors algorithm, which finished in 4.6869 seconds. Nearest neighbors still produced a lower error score, 10.9828, versus 13.0060 produced by ALS. What accounts for these differences? The first thing to explain is runtime. At a fixed number of features, increasing sparsity will decrease the runtime of ALS, because it will then be operating on smaller matrices. On the other hand, increasing sparsity will increase the runtime of the nearest neighbors algorithm, because each unknown pixel must have its surroundings searched for neighbors. Next we account for error score. Naive nearest neighbors seems to do better than ALS at all levels of sparsity. We would conjecture that this is because nearest neighbors exploits the high local correlation of pixels in an image more than SVD-based methods, which assign “features” to rows and columns, so that the dot product of the features associated with row x and column y yields pixel (x, y). Table 4: Nearest Neighbor Runtimes and Error Scores for Figure 6 Beach Image (500×750×1) Time (sec) RMSE PSNR (dB) 40% Sparse 16 nearest pixels 2.9034 6.9898 31.2415 70% 16 nearest pixels 4.6869 10.9828 27.3165 6
  • 7. 7 Conclusion and Further Questions A main conclusion to draw is that even a simple SVD technique can produce quality image reconstructions, and that this quality is robust to minor variations in the algorithm. We would attribute this to the fact that as long as an image is approximately correct, the human brain can easily make sense out of it. A major topic for further study, based on our results, is how to choose the optimal number of features for reconstructing an image with SVD. Low numbers of features and high numbers of features both seemed to produce better images than in-between quantities. Related but less impactful is the question of how to choose the regularization constant optimally; we chose λ = 0.05 based on what worked in [7]. Another main conclusion is that SVD, by itself, is not an optimal method of image reconstruction - the nearest neighbors reconstructions beat the error scores of the SVD reconstructions. However, there is a trade-off between runtime and accuracy. In our trials, SVD ran in much less time on highly sparse images than nearest neighbors, whereas for denser images, nearest neighbors won. This suggests using SVD for extremely sparse images, and nearest neighbors for everything else. A promising area of research lies in combining the two methods - perhaps nearest neighbors, or an algorithm like nonlocal means [2], could be used to smooth out an SVD reconstruction. Moreover, as real-world images often have some level of noise, the relationship between inferring unknown pixels and denoising erroneous pixels could be explored. For example: is it better to first reconstruct the missing pixels, and then denoise, or to denoise the known pixels, and then reconstruct the image? Other avenues for investigation abound. For example, if we can choose how to delete the pixels, what is the optimal way to do so in order to preserve the most information? Given the choice of deleted pixels, it might also be possible to significantly enhance the speed of ALS by solving for many features at once (this was not possible in our implemenation, as the missing pixels were random, forcing each row of features to be considered independently). Another way to enhance performance could be to break up large images into small subimages and apply ALS to them individually. Finally, the success of other SVD-based algorithms could be investigated, as SVD has proven fruitful in the field of image processing. References [1] Robert M. Bell and Yehuda Koren. Lessons from the netflix prize challenge. SIGKDD Explorations, 9(2):75– 79, 2007. [2] Antoni Buades, Bartomeu Coll, and Jean-Michel Morel. A non-local algorithm for image denoising. Computer Vision and Pattern Recognition, 2:60–65, 2005. [3] Emmanuel J. Cand`es and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. arXiv 0903.1476v1, 2009. [4] National Instruments Corporation. Peak signal-to-noise ratio as an image quality metric. http://www.ni.com/white-paper/13306/en/, 2013. [5] Andrey Feuerverger, Yu He, and Shashi Khatri. Statistical significance of the netflix challenge. Statistical Science, 27(2):202–231, 2012. [6] Benjamin Recht. A simpler approach to matrix completion. arXiv 0910.0651v2, 2009. [7] Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan. Large-scale parallel collaborative filtering for the netflix prize challenge. Lecture Notes in Computer Science, 5034:337–348, 2008. 7
  • 8. Original Reconstructed from 40% using 25 features 40% sparse Reconstructed from 40% using 100 features 70% sparse Reconstructed from 70% using 25 features Figure 1: A black-and-white image of a beach (size 500×750×1), reconstructed using the ALS algorithm from two different levels of sparsity. 8
  • 9. Original Reconstructed from 40% using 25 features 40% sparse Reconstructed from 40% using 100 features 70% sparse Reconstructed from 70% using 25 features Figure 2: A color image of boats at a dock (size 427×640×3), reconstructed using ALS from two different levels of sparsity. 9
  • 10. 90% sparse Reconstructed using 500 features Reconstructed using 10 features Reconstructed using 50 features Figure 3: The beach image from Figure 1 reconstructed three different ways from 90% sparsity 10
  • 11. Figure 4: Beach Image RMSE vs Features at Fixed Sparsity Levels 20% Sparse 40% Sparse 0 100 200 300 400 500 0 4 8 12 16 20 24 Features RMSE 0 100 200 300 400 500 0 5 10 15 20 25 30 35 40 Features RMSE 50% Sparse 70% Sparse 0 100 200 300 400 500 0 5 10 15 20 25 30 35 40 45 50 55 Features RMSE 0 100 200 300 400 500 0 5 10 15 20 25 30 35 40 45 50 55 60 Features RMSE 90% Sparse 0 100 200 300 400 500 20 30 40 50 60 70 80 Number of Features RMSE 11
  • 12. Figure 5: Optimal Feature Count vs Sparsity, to the Nearest 5 Features Beach 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 100 Sparsity RMSE−MinimizingFeatureCount Boats 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 100 Sparsity RMSE−MinimizingFeatureCount 12
  • 13. 40% sparse Averaging 16 Nearest Pixels 70% sparse Averaging 16 nearest pixels Figure 6: The beach image from Figure 1 reconstructed using a naive nearest neighbor algorithm 13