This document summarizes a research paper on implementing super-resolution algorithms for video on GPUs. It introduces two super-resolution algorithms: Fast and General Super-Resolution (FGSR) and Fast and Robust Super-Resolution (FRSR). FGSR can handle general video sequences using optical flow for alignment, while FRSR is for global-motion video. It accelerates parts of the algorithms using GPU parallelization, achieving a 6-10x speedup. Experimental results on test videos show FGSR and FRSR produce higher quality images than bilinear interpolation, with FRSR working best.
Effective Pixel Interpolation for Image Super Resolution
Final report
1. Super-resolution on video with GPU
Chien-hsin Hsueh∗ Hsing-Han Ho† Kuen-Shiou Tsai‡
CSIE, NTNU CSIE, NTU CSIE, NTU
Abstract several efficient and reliable scheme for GPU which are designed
to improve the performance of our super-resolution algorithm.
This work introduces a practical approach for super-resolution, the
process of reconstructing a high-resolution image from the low- This paper is organized as follows. In the next section we give
resolution input ones. The emphasis of our work is to super-resolve a brief survey of existing work on this topic. In Section 3, we de-
frames from dynamic video sequences and to improve the efficiency scribe our implementation of super-resolution algorithm, and in the
by GPU. In this work, we have implemented two super-resolution following section we discuss the results obtained and compare them
algorithms to reconstruct the high-resolution for different motion with different methods. Finally, in Section 5, we describe the draw-
types of frame. As the quality of super-resolved images highly backs of this method as revealed and summarize our conclusions.
relies on the correctness of image alignment between consecutive
frames, we employ the macroblock optical flow method to accu- 2 Related work
rately estimate motion between the image pair. An efficient and re-
liable scheme for GPU is designed to improve the performance of
our super-resolution algorithm. We also implement a video player Existing super-resolution algorithms can be roughly divided into
to demonstrate our result. A number of complex and dynamic video two main categories. One is reconstruction-based algorithms while
sequences are tested to demonstrate the applicability and reliability the other is learning-based algorithms.
of our algorithm.
Reconstruction-based Super-Resolution The base of
reconstruction-based super-resolution is uniform/non-uniform
Keywords: super-resolution, upsampling, image sequence, gpu, sampling theories. It assumes the original high-resolution signal
cuda (image) can be well predicted from the low-resolution input
samples (images). Most super-resolution algorithms fall into
this category. In most cases, the enforced smoothness constraint
1 Introduction suppresses high-frequency components and hence the results are
usually blurred. Regularization method can be used when the
The goal of Super-Resolution (SR) methods is to recover a high scene is strongly rigid, such as the case of a binary text image.
resolution image from one or more low resolution input images. Super-resolution can also be performed simultaneously in time and
In the classical multi-image SR a set of low-resolution images of in space.
the same scene are taken (at subpixel misalignments). Each low
resolution image imposes a set of linear constraints on the unknown Several refinements have been proprosed to address the robust-
high resolution intensity values. If enough low-resolution images ness issue of of super-resolution algorithms. One approach handles
are available (at subpixel shifts), then the set of equations becomes the case of moving object by motion segmentation. An accurate
determined and can be solved to recover the high-resolution image. motion segmentation is hence crucial. Unfortunately, accurate seg-
mentation is hard to obtain in the presence of aliasing and noise.
Most of the proposed super-resolution algorithms belong to Recently, a robust median estimator is used in an iterative super-
reconstruction-based algorithms which are based on sampling the- resolution algorithm.
orems. However, due to the constraints on the motion models of
the input video sequences, it is difficult to apply reconstruction- Learning-based Super-Resolution This kind of algorithms create
based algorithms. Most algorithms have either implicitly or ex- high-frequency image details by using the learned generative model
plicitly assumed the image pairs are related by a global paramet- from a set of training images. Several algorithms have been pro-
ric transformations, which may not be satisfied in dynamic video. posed for specific types of scene, such as faces and text. However,
It is challenging to design super-resolution algorithm for arbitrary Learning-based super-resolution algorithm are awkward to handle
video sequences. Video frames in general cannot be related through the dynamic real-world video sequences.
global parametric transformation due to the arbitrary individual
pixel movement between image pairs. Hence local motion models,
such as optical flow, need to be used for image alignment. 3 Algorithm
In this work, we have implemented two super-resolution algo- The input of our algorithm includes: 1) multiple low-resolution
rithm to reconstruct the high-resolution for different motion types video frames, (including the target frame and its neighboring
of frame. The first one is Fast and General Super-Resolution frames), 2) the desired magnification factor. The output is a high-
(FGSR), which can deal with the general video with good perfor- resolution image reconstructed at the target frame.
mance. As the quality of super-resolved images highly relies on
the correctness of image alignment between consecutive frames, In this work, we implement two super-resolution algorithms
we employ the macroblock optical flow method to accurately esti- to reconstruct the high-resolution images from several neighbor-
mate motion between the image pair. The second algorithm is Fast ing frames. We refer 2 papers, [Farsiu et al. 2004] and [Jiang
and Robust Super-Resolution (FRSR) to reconstruct the high res- et al. 2003] , and simplify the original algorithm. Parts of code
olution image from a global-motion-based video. We also design have accelerated by cuda to improve the execution time. Both of
them are reconstruction-based algorithms, one is Fast and General
∗ e-mail: 698470025@ntnu.edu.tw Super-Resolution (FGSR) while the other is Fast and Robust Super-
† e-mail:r9892204@ntu.edu.tw Resolution (FRSR). In the following context, we will describe the
‡ e-mail:r98944017@ntu.edu.tw implementation of each algorithm in detail.
2. 3.1 FGSR
FGSR represents for Fast and General Super-resolution (FGSR),
which could generate the super-resolved image from any dynamic
video sequence. Before we continue to describe the detail, lets first
define some notations :
• x denotes a target low-resolution image
• f denotes the desired high-resolution image
• f (n) is the approximation of f obtained after n-th iterations.
• gk denotes the k-th low-resolution image
• sk denotes the result of optical flow from the low-resolution Figure 2: The execution time of cpu and gpu. Depends on the input
image gk to the target image f image size, gpu can accelerate the bicubic interpolation for six to
.
ten times.
.
.
• f (n) is the approximation of f obtained after n-th iterations.
gk-1
• gk denotes the k-th low-resolution image
fn β
gk
fn+1 • mk denotes the mapping from the low-resolution image gk to
+ the target image f
gk+1
.
.
.
Figure 1: The procedure of FGSR . fn will add the difference be- g0,k
tween fn and gk , in this way, it could iteratively improve the detail
of high-resolved image.
Figure. 1 illustrates the basic procedure of the FGSR algorithm.
It starts with an initial estimation f0 by bicubic interpolation for the
high-resolution image f . After we up-sampled all g0 , k, then the
optical flow process ( from the gk to the target frame f ) is carried
out to obtain the simulated high-resolution images s0 , k. If the gk
is aligned with f , the residual pixels of sk − fn should improve f
the detail of fn . We can iteratively project the result of sk − fn to
refine the approximation. The β is defined as
1 Figure 3: Median of g0 , k . each pixel value of f is estimated by
β=
temporaldistance + 1 the median of go , k.
it represents for the reciprocal of temporal distance between the sk Median of the neighboring frames Af the beginning of the
and the target frame x. β is seen as the weight of sk −fn and project algorithm, we estimated the initail guess of high-resolution image
onto the fn+1 . With lower β represents for the lower alignment to by medain operator. Figure 3 illustrates the procedure of median
the target frame and lower influence to increase the detail of f . operator. First, we need to align all neighboring frames to map
In this work, we accelerate the bicubic interpolation by gpu. the target frame. As proved by Zhao etal. [Zhao and Sawhney
By parallelize the interpolation, it can relieve the execution time of 2002] an accurate alignment is the key to success of reconstruction-
the algorithm for six to ten times (depends on the image size). In based super-resolution algorithms. We employ the macro-block op-
Figure. 2, we can find that the cpu time increases linearly and the tical flow algorithm in our work. Second, for each pixel of f0 , we
gpu time maintains a constant execution time. Parallel processing choose the median of all g0,k to be the pixel values. The initial es-
potential of this part, which significantly increases the overall speed timated high-resolution image tends to be blurred, so the next step
of execution. we should deblur it to enhance the detail.
Bilateral Non Iterative Artifact Removal , we add a
3.2 FRSR non-iterative outlier removal step, after data fusion and, before
deblurring-interpolation step using the bilateral filter. Our refine-
FRSR represents for Fast and Robust Super-resolution(FRSR), ment method essentially calculates the correlation of different mea-
which is specific to the global-motion-based video frames. The fol- surements (pixels from different frames) with each other and re-
lowing notation are used with the following meanings in FRSR : moves the inconsistent data. The computed correlation is based
• x denotes a target low-resolution image on the bilateral idea, so the high-frequency (edge-information) data
will be differentiated from outliers. We assign a weight to each
• f denotes the estimated high-resolution image pixel in the measurements based on its bilateral correlation with
3. corresponding pixels in the data-fused image. After computing and recorded on Intel CoreI5-750 2.66 GHz CPU and 2 GB mem-
these weights, pixels with very small weights will be removed from ory and nVidia N240 1 GB video memory of GPU.
the data set. As pixels containing high-frequency information re-
ceive higher weights than the ones located in the low-frequency We first compare our first method, FGSR, with naive Bilinear
areas, it is reasonable to compute and compare the penalty weights interpolation in Figure. 4(c). The textitBook example (Figure 4(a))
for blocks of pixels rather than for single pixels. shows the target frame in the video clip with panning motion. To
generate the result, four neighboring low-resolution frames plus the
Robust Regularization Super-resolution is an ill-posed prob- target frame are used. The displacement between the consecutive
lem [Nguyen et al. 2001] [Tekalp 1995]. For the underdetermined frames is almost 10 pixels in somes cases. The super-resolved im-
cases, there exist an infinite number of solutions. The solution for age is magnified two times, i.e. 98x114 in resolution. Result from
square and overdetermined cases is not stable, which means small bilinear interpolation exhibits blocky and artifact (see Figure. 4(c)
amount of noise in measurements will result in large perturbations ) when comparing with our result. Next, we use our second SR
in the final solution. Therefore, considering regularization in super- method, FRSR to deal with the same target frame. In Figure. 4(e),
resolution algorithm as a means for picking a stable solution is very we can find that the result generated by FRSR is slightly better than
useful, if not necessary. Also, regularization can help the algorithm that of FGSR and bicubic interpolation one .
to remove artifacts from the final answer and improve the rate of
convergence. Of the many possible regularization terms, we desire In the Bottle example, we compare this two method with the
one which results in HR images with sharp edges and is easy to ground truth image. The low-resolution input with frames are sim-
implement. ulated by down-sampling the original frames (original resolution:
256x192) to 128x96. The ground truth of target frame is shown in
One of the most widely referenced regularization cost functions Figure. 5(a). We blow up part of the image to highlight the differ-
is the Tikhonov cost function [Elad and Feuer 1999]: ence between images by bilinear interpolation (Figure. 5(b)), FGSR
(Figure. 5(c)), FRSR(Figure. 5(d)). Experimental result (Table. 1)
γT (X) = ∥ΓX∥2
2 shows that image generated by FGSR algorithm outperforms that
of bilinear interpolation by 0.1927dB in terms of peak signal-to-
where Γ is usually a high-pass operator such as derivative, Lapla- noise ratio (PSNR). And the result generated by FRSR algorithm
cian, or even identity matrix. The intuition behind this regulariza- also outperforms that of bilinear interpolation by 0.7007dB and of
tion method is to limit the total energy of the image (when Γ is FGSR algorithm by 0.508dB.
the identity matrix) or forcing spatial smoothness (for derivative or
Laplacian choices of Γ). As the noisy and edge pixels both con-
tain high-frequency energy, they will be removed in the regular-
ization process and the resulting denoised image will not contain
sharp edges. Certain types of regularization cost functions work
efficiently for some special types of images but are not suitable
for general images (such as maximum entropy regularization which (a)Target frame
produce in sharp reconstructions of point objects, such as star fields
in astronomical images [Gibson and Bovik 2000]). One of the
most successful regularization methods for denoising and deblur-
ring is the Total Variation (TV) method [Rudin et al. 1992]. The
total variation criterion penalizes the total amount of change in the
image as measured by the L1 norm of the gradient and is defined
as:
γT V (X) = ∥ ▽ X∥1
(b) (c)
where ▽ is the gradient operator. The most useful property of total
variation criterion is that it tends to preserve edges in the recon-
struction [Gibson and Bovik 2000] [Rudin et al. 1992] [Chan et al.
2001], as it does not severely penalize steep local gradients. Based
on the spirit of total variation criterion, and bilateral filter, this reg-
ularizer called Bilateral-TV, which is computationally cheap to im-
plement, and preserves edges. The regularizing function looks like,
(d) (e))
∑∑
P P
γBT V (X) = α m+l
∥X − l m
Sx Sy X∥1 Figure 4: The Text example. In this experiment, the low-resolution
l=0 m=0 input frames are simulated by down-sampling the original frames
(b) (original resolution: 98x114) to 49x57. One target frame of the
l m
where matrices (operators) Sx , and Sy shift X by l, and k pixels down-sampled version is shown in (1). We magnifiy the target frame
in horizontal and vertical directions repectively, presenting several two times and compare with the result of bilinear interpolation (c),
scales of derivatives. The scalar weight α, 0 < α < 1, is applied to FGSR (d), and FRSR (e).
give a spatially decaying effect to the summation of the regulariza-
tion term.
We also implement a video player (see Figure. 6), which can
play the video sequence in real-time. User can zoom in/out the
4 Result frame to change the resolution on the selected area, and switch the
upsampling algorithm to compare the result. However, we haven’t
To verify our algorithm, we tested it with two video clips, namely integrated all our super-resolution algorithm into it for the lack
Text (98 x 114, 30 fps, Figure 3), Bottle(128 x 96, 30 fps, Fig- of time. So far, you just can switch the mode between nearest-
ure. 5(a) ),. All experiments and timing statistics are carried out neighbor, bilinear, and bicubic.
4. Our implementation can treat mild motion blur and spatially
varying blur in real-world video clips. Efficiently remove the noisy
and aliasing and generate high-resolution image. However, severe
blurring needs more efforts.
Table 1: A comparision of PSNR between different super-resolution
algorithm
Image Bilinear FGSR FRSR
Bottle 14.4501 dB 14.6428 dB 15.1508 dB
Text 13.5860 dB 13.2931 dB 13.3693 dB
(a)Ground truth
(a)Nearest-neighbor
(b)Bilinear interpolation
(b)Bicubic interpolation
Figure 6: The video player. User can zoom in/out the frame and
switch the upsampling algorithm to compare the result.
5 Conclusion
In this work, we implement two practical super-resolution
algorithms that is capable of reconstructing high-resolution im-
ages from complex and dynamic video sequences, which may con-
(c)FGSR algorithm result tain mild motion blur. By integrating the super-resolution algo-
rithm with GPU into the iterative reconstruction process, the super-
resolved images are generated in a short period of time. Two mild
and dynamic video sequences are tested to demonstrate the appli-
cability of this two algorithms. The performance of our algorithm
depends on the varying of global parametric transformations.
To further improve the speed performance, it would be pos-
sible the find a new algorithm avoid the iterative calculation. Be-
cause of the dependency of each iteration, it’s hard to accelerate
by gpu. Another direction to reconstruct the high-resolution image
by learning-based algorithm. In learning-based algorithm, it needs
to model correlations in image structure over extended neighbor-
hoods. The modeling complexity can be reduced remarkably if we
construct the prior model on images patches instead of full-size im-
(d))FRSR algorithm result ages, which can be easily parralelize on gpu.
Figure 5: Comparison with ground truth (a). In this experiment, References
we magnify the target frame of Bottle two times in both dimensions
using bilinear interpolation(b) and the result of FGSR (c) , and the
result of FRSR (c). C HAN , T., O SHER , S., AND S HEN , J. 2001. The digital tv filter
and nonlinear denoising. Image Processing, IEEE Transactions
on 10, 2 (feb), 231 –241.
5. E LAD , M., AND F EUER , A. 1999. Super-resolution reconstruc-
tion of continuous image sequences. In Image Processing, 1999.
ICIP 99. Proceedings. 1999 International Conference on, vol. 3,
459 –463 vol.3.
FARSIU , S., ROBINSON , M., E LAD , M., AND M ILANFAR , P.
2004. Fast and robust multiframe super resolution. Image Pro-
cessing, IEEE Transactions on 13, 10 (oct.), 1327 –1344.
G IBSON , J. D., AND B OVIK , A., Eds. 2000. Handbook of Image
and Video Processing. Academic Press, Inc., Orlando, FL, USA.
J IANG , Z., W ONG , T.-T., AND BAO , H. 2003. Practical super-
resolution from dynamic video sequences. In Computer Vision
and Pattern Recognition, 2003. Proceedings. 2003 IEEE Com-
puter Society Conference on, vol. 2, II–549 – II–554 vol.2.
N GUYEN , N., M ILANFAR , P., AND G OLUB , G. 2001. A com-
putationally efficient superresolution image reconstruction algo-
rithm. Image Processing, IEEE Transactions on 10, 4 (apr), 573
–583.
RUDIN , L. I., O SHER , S., AND FATEMI , E. 1992. Nonlinear total
variation based noise removal algorithms. In Proceedings of the
eleventh annual international conference of the Center for Non-
linear Studies on Experimental mathematics : computational is-
sues in nonlinear science, Elsevier North-Holland, Inc., Amster-
dam, The Netherlands, The Netherlands, 259–268.
T EKALP, A. M. 1995. Digital video processing. Prentice-Hall,
Inc., Upper Saddle River, NJ, USA.
Z HAO , W., AND S AWHNEY, H. S. 2002. Is super-resolution with
optical flow feasible? In ECCV ’02: Proceedings of the 7th Eu-
ropean Conference on Computer Vision-Part I, Springer-Verlag,
London, UK, 599–613.