Outcomes
Check-in #2 https://drive.google.com/file/d/10VuFeuNRABFL-yT5l57MPLGvugOKJ9mw/view?usp=sharing
Final Writeup https://drive.google.com/file/d/1IToD0aI_0bZDZN3Evu6Lm7r3RLsm1iQl/view?usp=sharing
Poster https://drive.google.com/file/d/1oZhzJRIQx0w9K4q72LYoF1Nlgx5uWpNd/view?usp=sharing
GitHub repo https://github.com/hlee183/transformer-prior
Title
Transformer Prior: Image Reconstruction without Learning
Who
Heejun Lee (hlee183)
Introduction
Let's imagine that we are asked to inpaint a masked image. To do this task, we don't need to practice with masked/unmasked images; we can fill the missing part with our prior knowledge on the image. Deep Image Prior validates that ConvNets can serve as such prior knowledge. Instead of learning from a data set of corrupted and original images, the paper tries to find a plausible image that can be generated from ConvNets for single image. Surprisingly, reconstructed images from ConvNets without learning were comparable to other image reconstruction method trained on data set.
On the other hand, Transformers are also used in the computer vision task. However, to the best of our knowledge, there has been no previous research to implement Transformer using the method of Deep image prior. In this project, we apply Transformer with Deep image prior method to image reconstruction task without learning. This will confirm if the Transformer contains prior knowledge of the image data.
Related Work
The code for Deep Image Prior paper can be found here. The idea is simple. Instead of minimizing the average loss of a ConvNet on a large data set, this method minimizes the loss for a single data. To be specific, every image has different parameters.
Data
We use ImageNet data set. And we generate corrupted data set using downsampling, masking, flashing, and adding random noise.
Methodology & Metrics
As mentioned before, there is no training process with data. The paper trains the ConvNets using only a single image, so it is likely that the ConvNets will eventually fit the corrupted image perfectly. So, the paper stops the optimization at a suitable number of iterations. Instead, we will try to find the optimal number of iterations using training set data, and apply the same number of iterations to the test set. The performance of Deep Image Prior will be measure in Mean-Squared-Errors.
On the other hand, we would like to see if we can automate the above process using regularizers. We will check if there is improvements of performance when we use Total-Variation regularizer.
We will experiment on various image reconstruction tasks; super-resolution, inpainting, flash/no-flash, denoising.
The goal is to check if Transformers contain proper prior knowledge on the structure of images. If Transformers works worse than ConvNets, it means that Transformers prior is not suitable for the image reconstruction. So, there is no benchmark performance we want to obtain. We only compare the performance with ConvNets.
Ethics
What are the broader societal issues are relevant to your chosen problem space?
Deep learning requires a lot of data for learning and a lot of computation. The problem is that it is sometimes expensive to collect data and to train models. For this reason, people can only benefit from deep learning by relying on the products of firms. But Deep Image Prior doesn't require any data. It reconstructs the image solely based on the prior contained in the structure of neural networks. So, at least for the image reconstuction task, we hope that people can get satisfactory outputs inexpensively.
Why is Deep Learning a good approach to this problem?
Priors contained by shallow models would be too parsimonious to represent the structure of images. Since Deep Learning has been successful for image classification task, it seems natural to check the contribution of the prior of the deep learning model isolating the information trained from images.
Division of Labor
Solo project
Open Sources
Deep Image Prior Vision Transformer Stand-Alone Self-Attention
Log in or sign up for Devpost to join the conversation.