Final Writeup

Take a look at our writeup!

Project Outline

By Aidan Cassel-Mace (acasselm) and Ezra Marks (emarks7)

Introduction

We are implementing the paper “Image-to-Image Translation with Conditional Adversarial Networks” by Isola et al., which investigates a general-purpose solution to generating output images from input images. For our purposes, the model described by the paper has shown promising results on reconstructing and synthesizing images from label maps and edge maps. Our goal is to construct images of clouds based on an input black-and-white image mask, which we hope will benefit from the model’s success with image synthesis from image segmentation data. If this model works as expected we will be able to synthesize realistic cloud images of any shape.

Project Type:

This is an image synthesis problem using unsupervised learning.

Related Work

Background Research

As we were exploring potential projects, we read the overview of the GAN architecture on Google Developers. We learned that GANs comprise two concurrently trained models: a discriminator and a generator. The discriminator solves an image classification problem, determining which inputs are genuine and which are fabricated. Meanwhile, the generator learns to create images to fool the discriminator, usually with noise as its input. The generator’s loss is determined by how well it is able to create images that the discriminator cannot classify as fabricated.

Public Implementations

Please note that we are aware that people have implemented this paper in Tensorflow and with Keras, but we are intentionally avoiding those implementations.

Data

We will be training our model on the Singapore Whole sky Imaging SEGmentation Database (SWIMSEG), licensed under the creative commons for non-commercial use with attribution. The dataset consists of 1013 600x600 images of clouds and their corresponding image masks, which serve to discriminate between areas containing clouds and areas of empty sky. The images were originally captured in Singapore and labeled with help from experts at Singapore Meteorological Services.

Methodology

The Pix2Pix model uses a conditional adversarial neural network. This network is made up of a generator and a discriminator. The generator architecture consists of a convolutional encoder neural network and a convolutional decoder neural network. Additionally, to take advantage of similarities between the input and the desired output, the model incorporates skip connections between each layer in the encoder and its counterpart in the decoder. The discriminator architecture is a convolutional image classification neural network.

We will train our model using a parallel dataset of images and image masks, consisting of cloud photos and manually created masks of cloud vs. empty sky areas. We will adhere to the paper’s strategy by performing one training step on the discriminator followed by one on the generator. We plan to train our model using a GCP GPU if possible.

We expect one of the hardest parts of implementing this model to be navigating the interplay between the generator and the discriminator. In contrast to previous work we’ve done, this model comprises multiple neural networks.

Metrics

After training our model we hope to be able to generate realistic looking clouds of any shape. We plan to create several input masks of interesting shapes and feed them into the generator.

Evaluating Results

For measuring the performance of our model, the notion of “accuracy” is not well-defined. As stated by the authors, “for graphics problems like… photo generation, plausibility to a human observer is often the ultimate goal.” Thus, our results will be measured more qualitatively than quantitatively. In order to add some quantifiable measure of success, Isola et al. asked participants on Amazon Mechanical Turk to try to discriminate between real and generated images. The percentage of the participants fooled by the fake images was taken as a quantitative measure of success, measuring their goal of generating believable images.

For our project, we won’t be paying third party participants to judge our cloud images. Instead we will be judging them using our own abilities of visual perception. We may additionally recruit a few volunteers to help subjectively judge the aesthetic quality of our output images.

Project Goals

  • Base: Generate images that look like clouds in the sky to a third party viewer.
  • Target: Generate images that look like clouds of a specific shape to a third party viewer.
  • Stretch: Generate images that not only look like clouds of a specific shape to a third party viewer, but that they also believe could be genuine photos.

Ethical Implications

In this course we’ve discussed that deep learning is a powerful tool with the potential to solve critical real-world problems, but that creating large models comes at a cost. We’re planning to use a GCP GPU to train our model, which will have a carbon footprint. As deep learning becomes increasingly accessible, individuals can apply it to problems indiscriminately, not always appropriately weighing the cost of building the model against the model’s benefit. Our project can certainly be seen as a misuse of deep learning. We’re choosing to spend our time and the earth’s resources to generate fun cloud photos. While we don’t expect to have a very significant environmental impact, the benefit of our model is also arguably negligible. That said, there are two ways in which we can justify our use of deep learning. Firstly, there is intrinsic value in art, and that includes fun-shaped clouds. Secondly, the primary benefit of this project will be our own education, which we hope will eventually have a more tangible positive impact on the world.

Our dataset comes from the National University of Singapore, and we have been granted permission for its use. It is worth nothing, however, that the dataset is not simply publicly available, and its access limitations raise an ethical consideration. We requested and were granted access to the dataset through a form which asked for our “affiliation (your institute, university, company, etc.).” As university students, this did not pose any problems, but the question could pose a barrier to entry for a budding deep learner without a university affiliation. Even if access is granted to all people who fill out the form, the requirement to specify one’s affiliation could disenfranchise self-taught or otherwise unaffiliated computer scientists.

Division of Labor

We expect to work collaboratively throughout the project, but we will begin by dividing the model into its two neural networks. Ezra will begin implementing the generator as Aidan constructs the descrimator and then moves onto the train function. Once we’ve completed these steps, we will allocate the remaining work such that we’re contributing equal time and effort.

Built With

Share this project:

Updates

posted an update

Mid-Project Update

Introduction

We are implementing the paper “Image-to-Image Translation with Conditional Adversarial Networks” by Isola et al., which investigates a general-purpose solution to generating output images from input images. For our purposes, the model described by the paper has shown promising results on reconstructing and synthesizing images from label maps and edge maps. Our goal is to construct images of clouds based on an input black-and-white image mask, which we hope will benefit from the model’s success with image synthesis from image segmentation data. If this model works as expected we will be able to synthesize realistic cloud images of any shape.

Progress

We’re feeling good about our progress so far. We’ve completed our preprocessing code, which reads in image files and outputs resized, normalized NumPy arrays representing the clouds and masks. We’ve also begun work on the generator model, defining the Keras layers of the encoder-decoder and completing the forward pass function. Although the generator we are currently implementing uses a standard encoder-decoder architecture, the paper found improved results using a U-Net architecture, which adds additional skip connections. Since our input image masks have few high-level features, we expect that the standard encoder-decoder may work adequately, but we still plan to add skip connections in the near future, upgrading to the U-Net architecture.

This image-to-image translation project is different from our past assignments in that it consists of two entirely separate models, training in parallel. It was a challenge getting our heads around this new architecture, especially when it came to structuring the two models in Python. After attending the GAN lectures and looking at the GAN lab, we have a much better understanding of these concepts. It seems that, because the generator and discriminator are fully separate, it should be straightforward to define them as separate models, with their only overlap being in the train function.

Next Steps

Our next steps will be to code the discriminator model and to complete the generator. Then we’ll write code to batch our inputs and train our models. Finally, we’ll begin producing concrete results by testing our model on real and fabricated masks and writing code to visualize the results.

Log in or sign up for Devpost to join the conversation.