Generating images from noise using Deep Convolutional GAN

Images generated after 120k iteration

Introduction:

Simple CNN Models are decent enough on classification and supervised learning tasks, but they were not known for unsupervised learning tasks until the GAN Model came out. GANs can generate new(real like) samples of same category, by which we can do data augmentation and many other tasks. This is a giant leap as these Models can create something new like an Artist. Today there are 100s of different Models of GANs are available. Some of which can even write poetry. For this project, I have implemented two GAN models. Namely, Deep Convolutional GAN and Conditional Self-Attention GAN with Wasserstein loss. These two models can generate new images similar to the ones in the dataset.

Dataset: CIFAR10

GAN:

In GAN architecture we use two CNN Model which compete against each other through Adversarial Training. These two networks are called Discriminator and Generator. We can think of them as Police and counterfeiter. The police want to detect the fake money and the counterfeiter tries to fool the police. In every loop, the police see a bunch of real money and then when the counterfeiter produces the fake money and claims it to be real the police try to detect it. So, in every loop, the counterfeiter gets better and better until the police cannot discriminate between real and fake.

DCGAN:

In this model, I have two CNNs named Discriminator(D) and Generator(D). D and G compete against each other. I build the combined model by concatenating D and G using Keras framework. Producing a random noise of 100 elements from a gaussian distribution to G, and in the above mentioned adversarial training, it learns to generate samples that look the same as real data. To elaborate, G takes as input a random noise and transforms it into a sample of the same dimension as the real data distribution(G(z)). D distinguishes between G(z) and real data samples from cifar-10. Then G gets updated according to the errors generated by D and this process repeats until D cannot discriminate between the real and fake.

Generator:

The generator takes in a random noise of 100 samples and uses the upscale convolutional neural network by which it transforms the noise into an image. Input data is passed through a fully connected layer of 2048 units and then upscale using transpose convolution. There are four transpose convolutional layers with 256, 128, 64 and 3 kernels respectively each with a kernel size of (5, 5) and a stride of 2. After each convolution, we perform a batch normalization and Leaky Relu as an activation function. The final layer has a tanh activation function.

Discriminator:

The discriminator takes an input image of (32x32x3) and transforms this into a real number between 0 and 1 indicating whether the image is fake or real. It has four convolutional layers each with 64, 128, 256, 512 filters which downsample the image like a normal CNN. After every convolution operation, I’m doing batch normalization and using leaky relu activation The image obtained after the final convolutional layer is flattened and passed to a sigmoid activation function. Adam optimizer is used with a learning rate of 0.0001 and beta of 0.5. The loss used is binary cross-entropy.

There is a combination of the generator and the discriminator. The output of the generator is connected to the input of the discriminator and at the time of backpropagation discriminator is not updating its weights.

Training:

I have used an iteration based training up to 165,000. In each iteration, I sample a batch of 64 random images from the training dataset and putting it to the discriminator model labeling them as real images. Then I am generating 64 fake images from the noise using the generator and feeding them to the discriminator labeling them as fake. Then the combined network is trained using generated images from the generator labelled as real. On every 1000th iteration, the fid score is calculated using 64 fake images that were previously generated versus 1000 random test images. These fid scores are stored and whenever there is an improvement in the current fid score with respect to the previous fid, the model is saved.