SeeNN
Image Super Resolution
Title: SeeNN
People
• Wyatt Woodbery (wwoodbery)
• Arushi Kalpande (akalpand)
• Gus LeTourneau (gletourn)
• Indiamei Gold-Coren (igolcor)
Introduction/Purpose We have chosen to implement “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network” by Ledig et. al (2017), which presents SRGAN, a generative adversarial network used for image super-resolution purposes. Image super-resolution is a supervised learning problem that enhances low-resolution input images into high-resolution outputs. The paper aims to improve upon previous models such as SRCNN that simply minimized mean square error loss, as such models did not always produce the most perceptually better results. We chose this paper as we were interested in the various applications of image super-resolution, and found that this model produces high quality results while also having a straightforward architecture. Specifically, we were interested in the role super resolution could play in other image recognition tasks, such as in image captioning.
Related Work
• Image Super-Resolution Using Deep Convolutional Networks
• ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
• Generative Adversarial Nets
The paper we are choosing to implement is an improvement on earlier image super-resolution (SR) models. For example, an earlier SRCNN as described in this paper, produces a high-resolution image output from a low-resolution image input in a three part process including path extraction and representation, nonlinear mapping, and finally reconstruction. Our implementation will make use of a generative adversarial network to improve upon these earlier implementations. Implementations beyond the paper we are re-implementing, such as enhanced super-resolution generative adversarial networks (ESRGAN) further improve upon SRGAN by fine tuning different steps, such as with Residual-in-Residual Dense Block (RRDB).
• https://github.com/Lornatang/SRGAN-PyTorch
• https://github.com/leftthomas/SRGAN (PyTorch)
• https://github.com/tensorlayer/SRGAN (TensorLayerX)
Data
Subset of ImageNet dataset (~1000 images: 800 training, 200 testing)
Preprocessing: creating low resolution counterparts to the ImageNet images
Methodology Obtain a low resolution (LR) counterpart to the high resolution (HR) images in our dataset by applying a Gaussian filter to the high resolution image (IHR) followed by downscaling by a factor of 4
Goal: train a generator, G, to take in ILR and return a super resolved counterpart ISR which is an estimate of IHR
G has B residual blocks which each consist of two convolutional layers with 3x3 kernels and 64 feature maps (filters) followed by batch normalization and ParametricReLU (PReLU)
In order to improve G, we employ an adversarial discrimator, D, which is trained to determine whether or not its input image is real or fake (i.e. whether it is IHR or ISR).
We train these two opposing systems at the same time using a min-max problem. That is, the discrimator wants to maximize its ability to tell real images from fake ones and the generator wants to minimize this ability.
D consists of 8 convolutional layers each with an increasing number of 3x3 kernels going from 64 to 512. Each increment is given two convolutional layers one with stride 1 and the other with stride 2. At the end of the convolutional layers are two dense layers followed by a sigmoid activation function.
During training, we alternate updates to G and D. Our optimizer is Adam with a learning rate of 10^-4
Metrics The original paper we are working off of used a human ranking system (in particular, Mean Opinion Score test) of images from various models’ outputs. We were considering conducting a similar type of test in which we survey peoples’ ranking of our model’s output compared to the original image.
The notion of accuracy in our model is not as easy to apply as previous assignments in this course. In addition to taking Mean Opinion Score, we could potentially use other methods of comparing two similar images, such as feature matching or edge detection for a more objective metric. We are also considering the idea of inputting the original and model output images through some existing image classifier to see how often the classifier can correctly identify both the model and original.
Base goal: Score higher than bicubic on MOS
Target goal: Score around the same as the original paper on MOS
Stretch goal(s): improve image classification/captioning
Ethics Since we’re using a subset of a general dataset, one ethical consideration is making sure our dataset is still representative and does not disproportionately favor one class of images over another. Though the implications of doing so would not be too drastic in our general model, if we were to train this model to perform super-resolution on a specific type of image, such as faces, for example, proper representation of faces of different races, genders, ages, etc. would be very important in debiasing our model and ensuring good results.
The limitations of image super-resolution can also be an ethical consideration. For example, face image super-resolution will never be able to exactly reconstruct a face, and so in certain situations such as criminal investigations, the use of SR might be inappropriate. Similarly, text is often difficult for SR models to reconstruct exactly. If people aren’t familiar with these limitations, there is a potential for misuse.
Division of labor
Pre-processing: Gus
Implementing: All
Training: All
Testing: Arushi, Wyatt, Indiamei
Built With
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.