Open Source meets Super Resolution, part 1

Open Source meets Super Resolution, part 1

Marcus Edel
September 21, 2020

Share this post:

Reading time:

Despite their great upscaling performance, deep learning backed Super-Resolution methods cannot be easily applied to real-world applications due to their heavy computational requirements. At Collabora we have addressed this issue by introducing an accurate and light-weight deep network for video super-resolution, running on a completely open source software stack using Panfrost, the free and open-source graphics driver for Mali GPUs. Here's an overview of Super Resolution, its purpose for image and video upscaling, and how our model came about.

Internet streaming has experienced tremendous growth in the past few years, and continues to advance at a rapid pace. Streaming now accounts for over 60% of internet traffic and is expected to quadruple over the next five years.

Video delivery quality depends critically on available network bandwidth. Due to bandwidth limitations, most video sources are compressed, resulting in image artifacts, noise, and blur. Quality is also degraded by routine image upscaling, which is required to match the very high pixel density of newer mobile devices.

The upscaling community has provided us with many fundamental advances in video and image upscaling, from classic methods such as Nearest-Neighbor, Linear and Lanczos resampling. However, no fundamentally new methods have been introduced in over 20 years. Also, traditional algorithm-based upscaling methods lack fine detail and cannot remove defects and compression artifacts.

All of this is changing thanks to the Deep Learning revolution. We now have a whole new class of techniques for state-of-the-art upscaling, called Deep Learning Super Resolution (DLSR).

Deep Learning Super Resolution (DLSR).

Super Resolution

An image's resolution may be reduced due to lower spatial resolution (for example to reduce bandwidth) or due to image quality degradation such as blurring.

Super-resolution (SR) is a technique for constructing a high-resolution (HR) image from a collection of observed low-resolution (LR) images. SR increases high frequency components and removes compression artifacts.

The HR and LR images are related via the equation:

LR = degradation(HR).

By applying the degradation function, we obtain the LR image from the HR image. If we know the degradation function in advance, we can apply its inverse to the LR image to recover the HR image. Unfortunately we usually do not know the degradation function beforehand. The problem is thus ill-posed, and the quality of the SR result is limited.

DLSR solves this problem by learning image prior information from HR and/or LR example images, thereby improving the quality of the LR to HR transformation.

The key to DLSR succsss is the recent rapid development of deep convolutional neural networks (CNNs). Recent years have witnessed dramatic improvements in the design and training of CNN models used by Super-Resolution.

Upscaling

Upscaling can be achieved using different techniques, such as the aformentioned Nearest-Neighbor, Linear and Lanczos resampling methods. The group of images below demonstrates these different options.

First, the lower resolution input image to be be upscaled:

(Photo by Jon Tyson on Unsplash)

Then, the various methods can be applied. Click on the image below to get a closer look at each result, as well as the original image before it was downscaled.

The input image is upscaled by Nearest-Neighbour interpolation.
The input image is upscaled by Bi-linear interpretation (the most common used method).
The input image is upscaled by Lanczos' interpolation (one of the best standard methods).
The input image is upscaled and improved by our Deep Learning Super Resolution model.
The target image or ground truth, which was downscaled to create the lower resolution input.

The objective is to improve the quality of the LR image to approach the quality of the target, known as the ground truth. In this case, round truth is the original image which was downscaled to create the low-resolution image.

Deep Learning Super Resolution

The standard approach to Super-Resolution using Deep Learning or Convolution Neural networks (CNNs) is to use a fully supervised approach where a low-resolution image is processed by a network comprising convolutional and up-sampling layers to produce a high-resolution image. This generated HR image is then matched against the original HR image using an appropriate loss function. This approach is commonly known as "paired setting" as it uses pairs of LR and corresponding HR images for training.

More recently, and following the introduction of generative adversarial networks (GANs), GANs are one of the most utilized machine-learning architectures for Super-Resolution.

In generative adversarial networks, two networks train and compete against each other, resulting in mutual learning. The first network, called the generator, generates high-resolution inputs and tries to fool the second network, the discriminator, into accepting these as true high-quality inputs. The discriminator output predicts if an input is a real high-quality image (similar to the training set) or if it's a fake or bad upscaled image.

The technical details considerably more complex but follow these general principles.

Examples

The following shows different examples of X4 upsampling using our trained Deep Learning Super Resolution model. You can click on each image to view its original size. We also list the output for Nearest Neighbour, Bi-linear and Lanczos' interpolation for comparison.

1. Food

The model adds details to the vegetables, the plates and the background. Input, Nearest Neighbour, Bi-linear, Lanczos, Original.