Final Writeup
Project Outline
By Aidan Cassel-Mace (acasselm) and Ezra Marks (emarks7)
Introduction
We are implementing the paper “Image-to-Image Translation with Conditional Adversarial Networks” by Isola et al., which investigates a general-purpose solution to generating output images from input images. For our purposes, the model described by the paper has shown promising results on reconstructing and synthesizing images from label maps and edge maps. Our goal is to construct images of clouds based on an input black-and-white image mask, which we hope will benefit from the model’s success with image synthesis from image segmentation data. If this model works as expected we will be able to synthesize realistic cloud images of any shape.
Project Type:
This is an image synthesis problem using unsupervised learning.
Related Work
Background Research
As we were exploring potential projects, we read the overview of the GAN architecture on Google Developers. We learned that GANs comprise two concurrently trained models: a discriminator and a generator. The discriminator solves an image classification problem, determining which inputs are genuine and which are fabricated. Meanwhile, the generator learns to create images to fool the discriminator, usually with noise as its input. The generator’s loss is determined by how well it is able to create images that the discriminator cannot classify as fabricated.
Public Implementations
Please note that we are aware that people have implemented this paper in Tensorflow and with Keras, but we are intentionally avoiding those implementations.
Data
We will be training our model on the Singapore Whole sky Imaging SEGmentation Database (SWIMSEG), licensed under the creative commons for non-commercial use with attribution. The dataset consists of 1013 600x600 images of clouds and their corresponding image masks, which serve to discriminate between areas containing clouds and areas of empty sky. The images were originally captured in Singapore and labeled with help from experts at Singapore Meteorological Services.
Methodology
The Pix2Pix model uses a conditional adversarial neural network. This network is made up of a generator and a discriminator. The generator architecture consists of a convolutional encoder neural network and a convolutional decoder neural network. Additionally, to take advantage of similarities between the input and the desired output, the model incorporates skip connections between each layer in the encoder and its counterpart in the decoder. The discriminator architecture is a convolutional image classification neural network.
We will train our model using a parallel dataset of images and image masks, consisting of cloud photos and manually created masks of cloud vs. empty sky areas. We will adhere to the paper’s strategy by performing one training step on the discriminator followed by one on the generator. We plan to train our model using a GCP GPU if possible.
We expect one of the hardest parts of implementing this model to be navigating the interplay between the generator and the discriminator. In contrast to previous work we’ve done, this model comprises multiple neural networks.
Metrics
After training our model we hope to be able to generate realistic looking clouds of any shape. We plan to create several input masks of interesting shapes and feed them into the generator.
Evaluating Results
For measuring the performance of our model, the notion of “accuracy” is not well-defined. As stated by the authors, “for graphics problems like… photo generation, plausibility to a human observer is often the ultimate goal.” Thus, our results will be measured more qualitatively than quantitatively. In order to add some quantifiable measure of success, Isola et al. asked participants on Amazon Mechanical Turk to try to discriminate between real and generated images. The percentage of the participants fooled by the fake images was taken as a quantitative measure of success, measuring their goal of generating believable images.
For our project, we won’t be paying third party participants to judge our cloud images. Instead we will be judging them using our own abilities of visual perception. We may additionally recruit a few volunteers to help subjectively judge the aesthetic quality of our output images.
Project Goals
- Base: Generate images that look like clouds in the sky to a third party viewer.
- Target: Generate images that look like clouds of a specific shape to a third party viewer.
- Stretch: Generate images that not only look like clouds of a specific shape to a third party viewer, but that they also believe could be genuine photos.
Ethical Implications
In this course we’ve discussed that deep learning is a powerful tool with the potential to solve critical real-world problems, but that creating large models comes at a cost. We’re planning to use a GCP GPU to train our model, which will have a carbon footprint. As deep learning becomes increasingly accessible, individuals can apply it to problems indiscriminately, not always appropriately weighing the cost of building the model against the model’s benefit. Our project can certainly be seen as a misuse of deep learning. We’re choosing to spend our time and the earth’s resources to generate fun cloud photos. While we don’t expect to have a very significant environmental impact, the benefit of our model is also arguably negligible. That said, there are two ways in which we can justify our use of deep learning. Firstly, there is intrinsic value in art, and that includes fun-shaped clouds. Secondly, the primary benefit of this project will be our own education, which we hope will eventually have a more tangible positive impact on the world.
Our dataset comes from the National University of Singapore, and we have been granted permission for its use. It is worth nothing, however, that the dataset is not simply publicly available, and its access limitations raise an ethical consideration. We requested and were granted access to the dataset through a form which asked for our “affiliation (your institute, university, company, etc.).” As university students, this did not pose any problems, but the question could pose a barrier to entry for a budding deep learner without a university affiliation. Even if access is granted to all people who fill out the form, the requirement to specify one’s affiliation could disenfranchise self-taught or otherwise unaffiliated computer scientists.
Division of Labor
We expect to work collaboratively throughout the project, but we will begin by dividing the model into its two neural networks. Ezra will begin implementing the generator as Aidan constructs the descrimator and then moves onto the train function. Once we’ve completed these steps, we will allocate the remaining work such that we’re contributing equal time and effort.
Built With
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.