Final Deliverables
Quick Links
Final Writeup
https://drive.google.com/file/d/11K4noc6QNwTTFPWkojklPE53AwEmqyZa/view?usp=sharing
Poster (PDF)
https://drive.google.com/file/d/1b9LdoLmhmUKA11fDv94AZE2fKvdSqC-x/view?usp=sharing
Video Link
https://www.youtube.com/watch?v=_fq1ElZK0RM&feature=youtu.be
First Reflection
SharpieNet
Sharpie because we just watched a video about Sharpie, Net because the best deep learning models all have Net in their name.
Who
xchai1: Xinzhe Chai
ehong9: Elliot Jinoo Hong
Introduction
As our work for school clubs often requires us to make carefully designed posters, we often find ourselves stuck on one problem - we cannot find a high-resolution picture that is available for me to use. Therefore, we figured it would be cool to have a super-resolution generator to upscale the resolution of the images for us! This is a developing technology, so we decided to base our implementation on an existing paper.
The original paper we are trying to replicate is “Enhanced Deep Residual Networks for Single Image Super-Resolution” (EDRN) link to paper. We chose this paper because it, out of all the papers we looked at, was the most thorough in its description of the neural network and doesn’t use a GAN. We wanted to avoid using a GAN because we felt as though the ResNet was hard enough and implementing a GAN would add too much complexity - especially given that this is a two-person project. The aim of this paper is to develop a state-of-the-art super-resolution network by improving performance by removing unnecessary modules in conventional ResNets. This removal of modules results in a simplifying of the ResNet structure, which was another reason why we chose to implement this paper over traditional ResNet papers. It also uses a post-upsampling model where upsampling is done after the last convolution module - which is something we think will be easier to implement over other forms of upsampling. It also expands the network once learning stabilizes. This allows the model to potentially learn “more shapes and patterns”. Lastly, the paper demonstrates that training and testing can be done so that the model can scale an image to various scale factors.
This is a Structured Prediction deep learning project, as it takes in a low-res photo and tries to predict its high-res version.
Related Work
J. Kim, J. Kwon Lee, and K. M. Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR 2016. This is often referred to in our EDRN paper. It is a paper that outlines the first use of a ResNet to solve vanishing gradients and therefore create a deeper network for better performance.
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. Photo-realistic single image super-resolution using a generative adversarial network. arXiv:1609.04802, 2016. This is the paper that outlines SRResNet which solve specific time and memory problems that were previously bottlenecks for these super-resolution nets.
Beyond Minds is a website that introduces super-resolution upsampling, loss functions, and metrics well.
Data
Since it is an existing field with some completed research, we were able to find the specific dataset for super-resolution images, which is commonly called DIV2K. As its name suggests, the training set consists of 100 2K high resolution (HR) images, as well as the low resolution (LR) images as raw inputs. The state-of-the-art model proposed in the paper suggests us to train the model with a different scaling factor, and train only the 2x scaled images from the scratch - for the set with higher scaling factors, it would be optimum to use the previous weights as the raw input, as the empirical data suggests that the methods greatly improve converging time. During preprocessing, we would also subtract the mean RGB value of the whole set to each individual image, and we will use random rotation strategies during our training process to avoid overfitting.
The dataset consists of diverse pictures (DIV2K), and the images are pre-selected to prevent obvious bias. The dataset can be easily downloaded at dataset link.
Methodology
We aim to follow the paper’s training methodology as closely as possible. Inputs will be RGB patches of size 48x48 (from low-res image). The corresponding high res patches will be the training labels. We will augment the training data by randomly flipping and rotating the image. We then subtract the mean RGB value of the DIV2K dataset from all training examples. We will be using the adam optimizer with beta_1 = 0.9, beta_2 = 0.999 and epsilon = 10^(-8). We mini-batch with a size of 16 and the learning rate starts at 10^(-4) and is halved after every 200000 mini-batches. We will experiment with both L1 loss and L2 loss. While L2 loss is theoretically better, L1 loss gave better empirical results.
We will be training in two parts:
- training the x2, x3, and x4 models (x3 and x4 models begin training as a pre-trained x2 model.)
- training the multi-scale model.
The hardest part of implementing this paper will be saving the parameters between models so that they can learn from each other. This is a crucial part of the model as it will be how we will be able to initialize the x3 and x4 upscaling models with a x2 pre-trained one.
Geometric Self-ensemble is a foreign topic to both of us and so while it does not seem like it would be difficult to implement, we will need to spend extra time understanding this new concept.
Metrics
As described above, we will be experimenting between L1 and L2 loss and the depth of our neural network.
We aim to have both qualitative and quantitative metrics. Qualitative metrics will be done by randomly selecting 20 images and seeing how the image looks when seen by the human eye. Quantitative metrics will be made by calculating the peak signal to noise ratio (PSNR) and/or structural similarity (SSIM). With PSNR we can measure the reduction in noise; it quantifies differences in the pixel values of two images. SSIM allows us to quantify the structural similarity between the two images. Accuracy would be a function of these two metrics.
Goals:
- Base Goal: to successfully implement, train, and test the single-scale models with reasonable results.
- Target Goal: to successfully implement the multi-scale model outlined in this paper with reasonable results.
- Stretch Goal: to successfully implement the multi-scale model outlined in this paper with superior results.
Ethics
Deep learning is a great way to solve this problem because it is a traditionally non-solvable problem - how do we manually enlarge a picture while preserving the original clarity? An artist might be able to draw the picture on a larger canvas, but that is time-consuming and physically demanding. With deep neural networks, the weights and the biases can help us to predict which color should the most adjacent pixel be, and the tremendous computation power of modern computers can help us to enlarge a picture within seconds.
This project is different from many other deep learning projects in that its result is directly observable - and would pose limited consequences to society if it fails. All the model does is to take in a low-resolution image and output a higher resolution image with the same clarity. If the model does not do a good job, the user or the researchers can learn about it immediately. However, as with most deep learning projects, we can never know when it would fail, or under what circumstances the model would fail. This might impose unintended consequences when the technology is used in automated processes - profile picture super-resolution, or social media images super-resolution. We believe that with a dataset that includes diverse objects and environments, we will be able to minimize the biases our model has and try to reduce its error rate in real-world applications.
Division of Labor
Because we have a small team, a lot of the work will be shared. With that said, Jinoo, with a slightly stronger math background will focus on the loss functions and on the more theoretical side of the model. Chai will be more involved in coding and validating the correctness of the architecture, as well as paying slightly more attention on the presentation parts of this project. We expect the collaboration to be seamless, and both of us will contribute equal amounts of time to this project.
Second Reflection
Challenges
What has been the hardest part of the project you’ve encountered so far?
The hardest part of the project would be getting familiarized with subclassing and TensorFlow Functional APIs with Keras. Also, initializing 10+ ResNet blocks is also a challenging task because there is no pre-built Keras function that is readily available to us to use.
Insights
Are there any concrete results you can show at this point?
Not yet, but we are on track to finish the project ahead of time, and we have verified that our preprocessing, training and testing is working.
How is your model performing compared with expectations?
We expect it to be within our expectation, that is, the users will be able to see a significant difference in clarity between input and predicted images.
Plan
Are you on track with your project?
We are on track with our project, as we have finished most parts of the coding. We are debugging our model, as we handwrote the ResNet blocks and have some incompatibility issues.
What do you need to dedicate more time to?
We need to spend more time on making sure our code works, and after that, fine-tuning our hyperparameters to achieve better results.
Built With
- python
- tensorflow

Log in or sign up for Devpost to join the conversation.