Outcomes

Check-in #2 https://drive.google.com/file/d/10VuFeuNRABFL-yT5l57MPLGvugOKJ9mw/view?usp=sharing

Final Writeup https://drive.google.com/file/d/1IToD0aI_0bZDZN3Evu6Lm7r3RLsm1iQl/view?usp=sharing

Poster https://drive.google.com/file/d/1oZhzJRIQx0w9K4q72LYoF1Nlgx5uWpNd/view?usp=sharing

GitHub repo https://github.com/hlee183/transformer-prior

Title

Transformer Prior: Image Reconstruction without Learning

Who

Heejun Lee (hlee183)

Introduction

Let's imagine that we are asked to inpaint a masked image. To do this task, we don't need to practice with masked/unmasked images; we can fill the missing part with our prior knowledge on the image. Deep Image Prior validates that ConvNets can serve as such prior knowledge. Instead of learning from a data set of corrupted and original images, the paper tries to find a plausible image that can be generated from ConvNets for single image. Surprisingly, reconstructed images from ConvNets without learning were comparable to other image reconstruction method trained on data set.
On the other hand, Transformers are also used in the computer vision task. However, to the best of our knowledge, there has been no previous research to implement Transformer using the method of Deep image prior. In this project, we apply Transformer with Deep image prior method to image reconstruction task without learning. This will confirm if the Transformer contains prior knowledge of the image data.

Related Work

The code for Deep Image Prior paper can be found here. The idea is simple. Instead of minimizing the average loss of a ConvNet on a large data set, this method minimizes the loss for a single data. To be specific, every image has different parameters.

Data

We use ImageNet data set. And we generate corrupted data set using downsampling, masking, flashing, and adding random noise.

Methodology & Metrics

As mentioned before, there is no training process with data. The paper trains the ConvNets using only a single image, so it is likely that the ConvNets will eventually fit the corrupted image perfectly. So, the paper stops the optimization at a suitable number of iterations. Instead, we will try to find the optimal number of iterations using training set data, and apply the same number of iterations to the test set. The performance of Deep Image Prior will be measure in Mean-Squared-Errors.

On the other hand, we would like to see if we can automate the above process using regularizers. We will check if there is improvements of performance when we use Total-Variation regularizer.

We will experiment on various image reconstruction tasks; super-resolution, inpainting, flash/no-flash, denoising.

The goal is to check if Transformers contain proper prior knowledge on the structure of images. If Transformers works worse than ConvNets, it means that Transformers prior is not suitable for the image reconstruction. So, there is no benchmark performance we want to obtain. We only compare the performance with ConvNets.

Ethics

What are the broader societal issues are relevant to your chosen problem space?

Deep learning requires a lot of data for learning and a lot of computation. The problem is that it is sometimes expensive to collect data and to train models. For this reason, people can only benefit from deep learning by relying on the products of firms. But Deep Image Prior doesn't require any data. It reconstructs the image solely based on the prior contained in the structure of neural networks. So, at least for the image reconstuction task, we hope that people can get satisfactory outputs inexpensively.

Why is Deep Learning a good approach to this problem?

Priors contained by shallow models would be too parsimonious to represent the structure of images. Since Deep Learning has been successful for image classification task, it seems natural to check the contribution of the prior of the deep learning model isolating the information trained from images.

Division of Labor

Solo project

Open Sources

Deep Image Prior Vision Transformer Stand-Alone Self-Attention

Built With

Share this project: