Self-similarity based super-resolution (SR) algorithms are able to produce visually pleasing results without extensive training on external databases. Such algorithms exploit the statistical prior that patches in a natural image tend to recur within and across scales of the same image. However, the internal dictionary obtained from the given image may not always be sufficiently expressive to cover the textural appearance variations in the scene. In this paper, we extend self-similarity based SR to overcome this drawback. We expand the internal patch search space by allowing geometric variations. We do so by explicitly localizing planes in the scene and using the detected perspective geometry to guide the patch search process. We also incorporate additional affine transformations to accommodate local shape variations. We propose a compositional model to simultaneously handle both types of transformations. We extensively evaluate the performance in both urban and natural scenes. Even without using any external training databases, we achieve significantly superior results on urban scenes, while maintaining comparable performance on natural scenes as other state-of-the-art SR algorithms.
http://bit.ly/selfexemplarsr
4. External Example-based Super-Resolution
Learning to map from low-res to high-res patches
• Nearest neighbor [Freeman et al. CG&A 02]
• Neighborhood embedding [Chang et a. CVPR 04]
• Sparse representation [Yang et al. TIP 10]
• Kernel ridge regression [Kim and Kwon PAMI 10]
• Locally-linear regression [Yang and Yang ICCV 13] [Timofte et al. ACCV 14]
• Convolutional neural network [Dong et al. ECCV 14]
• Random forest [Schulter et al. CVPR 15]
External dictionary
5. Internal Example-based Super-Resolution
Low-res and high-res example pairs from patch
recurrence across scale
• Non-local means with self-examples [Ebrahimi and Vrscay ICIRA 2007]
• Unified classical and example SR [Glasner et al. ICCV 2009]
• Local self-similarity [Freedman and Fattal TOG 2011]
• In-place regression [Yang et al. ICCV 2013]
• Nonparametric blind SR [Michaeli and Irani ICCV 2013]
• SR for noisy images [Singh et al. CVPR 2014]
• Sub-band self-similarity [Singh et al. ACCV 2014]
Internal dictionary
6. Motivation
• Internal dictionary
• More “relevant” patches
• Limited number of examples
• High-res patches are often available in the transformed domain
Symmetry Surface orientation Perspective distortion
10. Input low-res image
All-frequency band low-frequency band
Super-Resolution Scheme
Multi-scale version of [Freedman and Fattal TOG 2011]
11. Input low-res image
LR/HR example pairs
Super-Resolution Scheme
Multi-scale version of [Freedman and Fattal TOG 2011]
low-frequency bandAll-frequency band
60. Limitations – Blur Kernel Model
• Suffer from blur kernel mismatch
• Blind SR to estimate kernel
[Michaeli and Irani ICCV 2013]
[Efrat et al. ICCV 2013]
• With ground truth kernel, we can
get significantly improvement
• External example-based method
would need to retrain the model
61. Limitations
• Slow computation time
• On average, 40 seconds for super-resolving 2x on an image in BSD 100 dataset
on a 2.8Ghz PC, 12G RAM PC
SRF 4x
Ground truth HR Our result
A+ [Timofte et al. ACCV 14]SRCNN [Dong et al. ECCV 14]
62. Conclusions
• Super-resolution based on transformed self-exemplars
• No training data, no feature extraction, no complicated learning algorithms
• Works particularly well on urban scenes
• On par with state-of-the-art on natural scenes
Code and data available: http://bit.ly/selfexemplarsr
See us on poster #82
Image super-resolution is a longstanding problem in computer vision which aims at recovering missing high-frequency components in images. Take this image as an example, sharpening can boost the available spatial frequency to make the image appears more clear. In contrast, the goal of image super-resolution is to recover the high-frequency contents that are NOT present in the original image.
Super-resolution (SR) techniques can be broadly classified into two classes.
First, the classical multi-image approach can super-resolve a scene by combining images with subpixel misalignment.
Second, example-based approaches achieve super-resolution by learning the mapping from low to high resolution image patches.
A way to learning such a mapping is to build an external database of low-res/high-res pairs of patches. Then, one can use a machine learning algorithm to learning the mapping.
On the other hand, people have shown that patches in a natural image tend to recur within and across scales many times. This internal statistics provides powerful image prior for SR.
While the internal dictionary contains more relevant patches, they have significantly few examples than that from external methods. In the work, we propose to address this problem using transformed self-exemplars. The motivation is that many high-res patches are often available ONLY in the geometrically transformed domain.
Here we show the comparisons of external, internal, and our approach.
Here is an image consists of repetitive patterns. The red patches is the target patch we want to super-resolve. Using internal example-based methods, we match the low-res patch against all translated patches in the downscaled image. Here is the matching error. By selecting the patch with the lowest cost, we can get an exemplar patch pair for predicting the missing high-frequency contents. As the texture is perspectively distorted, the prediction is not accurate. In contrast, matching in the transformed space gives accurate prediction.
Similarly, in this case, we can see that matching in affine transformation achieves better prediction.
Now we describe our super-resolution scheme. Given an input low-resolution image, we construct an image pyramid by successively downscaling the image. This set of images contain all-frequency bands at their spatial resolution. We then perform upsampling to obtain the low-frequency band version of the pyramid.
This two image pyramids form LR/HR example pairs.
To perform super-resolution, we first upsample the input image using bicubic interpolation. The task here is to predict the missing high-frequency contents.
We cast SR as a patch-based optimization problem. For each overlapping patches in the low-frequency band, we search its nearest neighbor in the transformed space, the estimated nearest neighbor field can then be used to reconstruct the missing high-frequency contents. To achieve high SRF, we iteratively perform such operation in a coarse to fine fashion.
Our objective function for estimating nearest neighbor field consists of three terms. First, appearance cost measures how similar the warpped source patch to the target patch. We use L2 norm of RGB patches with bias correction. Second, we use the plane compatibility cost proposed in our previous work in SIGGRAPH 2014. This term encourage the search to lie on the correct region. The third cost encourages the search process to find source patches in the deeper level of the pyramid in order to get better reconstruction at higher SRF.
We test our algorithm on two main datasets. BSD contains more natural scenes. To complement the dataset, we introduce a new Urban dataset. We download 100 urban images with structured scenes from Flickr using search keyword, city, architecture, etc.