Studying Generative Adversarial Networks (GANs)

Inception score metric
Poster

Deliverables

Check In 2:https://docs.google.com/document/d/1CsvyJMxei3jHVfasm_8su6iLOBaB0KXzb6hw6G_x8jY/edit?usp=sharing

Check In 3: https://docs.google.com/document/d/1GZdMhJHFFAI5AUwfAI-NFpb5wycDCtM-tX9qil619bM/edit?usp=sharing

Final Reflection: https://docs.google.com/document/d/1l20lFmJc82DKqrajH7GFv8_UOxNUGvlmXEIPB11bm88/edit?usp=sharing

Github Link to code: https://github.com/sxfiavn/conditional-GANs.git

Initial Writeup

Title: Studying the generative and detective abilities of deep neural networks

Who: Names and logins of all your group members. Sofia Vaca Narvaja Córdoba - svacanar Ben Bonas - bbonas Daniel Hadar - dhadar Keya Kilachand - kkilacha

Introduction: What problem are you trying to solve and why? Entering a world where machine generated images are quickly permeating public media, we wanted to understand how this content is created. This would hopefully give us insight into how to be more sensitive to 'fake' content, and how to tell the difference between it and 'real' information.

As we explored in lab 8, Generative adversarial networks (GANs) are composed of two two deep neural networks—the generator network, responsible for generating fake images, and the discriminator network, tasked with distinguishing between real and fake (generated) images. Conditional GANs (cGANs) take this concept further by integrating images and their respective labels into the network during training, allowing the generator to learn the nuanced characteristics associated with each label.

We have not based this idea on a specific paper, more on the concept of Generative Adversarial Networks (GANs) introduced to us when we spoke about image generation during checkoff 1 followed by the examples in lecture.

Our architecture will include two deep neural networks—the generator network, to generate images of buildings, and the discriminator network, used to determine if an input image is real or fake. In terms of the ‘kind’ of problem, this is first a generation task and then a classification problem.

We recognize the ability to generate images of buildings with specific architectural styles is crucial in various fields ranging from urban planning to historic preservation. For this reason we chose to work with images of buildings paired with labels representing their corresponding architectural styles. Our work is built upon code from the Keras library and leverages insights gained from previous classwork.

Related Work:

Linked here is an article we found on GANs, to learn more about the rationale behind how they were designed, and how they can be implemented. The beginning of the paper outlines a roadmap for how GANs work, and the phases in which they can be coded. It then dives into some pseudo code, which would be very helpful to guide the project, before displaying results and touching on real life challenges that have been faced with these networks in the past. These mostly circle around the inherently adversarial nature of the two models, where good performance is mutually exclusive. The prevailing conclusion is that the best performance that can be achieved is determined by something similar to the solution to the Nash equilibrium in game theory which states that a player can achieve the desired outcome by not deviating from their initial strategy.

Useful Papers: Architectural Style Classification Paper https://ieeexplore.ieee.org/document/9753782 https://github.com/pratikpv/mri_gan_deepfake Medium: Deepfake face detection https://www.scientificbulletin.upb.ro/rev_docs_arhiva/full45d_519336.pdf How to make a Generator Model

Data: What data are you using (if any)?

We are using the “Architectural styles”' dataset by dumitrux, which contains 10113 images from 25 architectural styles. The data comprises a combination of Google scraping and the dataset used for the paper "Architectural Style Classification using Multinomial Latent Logistic Regression" (ECCV2014), made by Zhe Xu. The data was further augmented from 4979 to 9588 images by ‘transforming’ the images by means of rotation, flips, zoom, light etc.

The data itself seems to be regular images, separated into a training set and a test set. We definitely expect to do some preprocessing to convert these into image descriptors or even raw pixel matrices for ease of comparison, but this is all subject to how we generate our images and what form the output of our generator algorithm takes. We would also probably crop or trim the images to have uniform size, and scale their pixel values for ease of computation.

Methodology: What is the architecture of your model? The generator doesn’t require any training, so we won’t need any training data, only Gaussian random vectors that can be converted into images. For the discriminator model used to classify images as ‘real’ or ‘fake’ we intend to use the training dataset from the “Architectural Styles” dataset mentioned above. Initially, we will use the preprocessed Architectural Styles dataset images as the ‘real’ images, and generate same sized matrices of random pixel values as the ‘fake’ images. This method is drawn from the one outlined in this article. Eventually we would create a dataset of ‘real’ images from the Architectural Styles dataset and ‘fake’ images from the generator.

So far we think the most difficult part of implementing the model will definitely be the generator, since we haven’t done anything like it before. Eventually, based on the challenges we have read in some of the papers referenced above, balancing achieving a high performance for both models will be extremely challenging as well.

Metrics: What constitutes “success?”

We plan on saving a part of the “Architecture Styles” dataset as a ‘test’ or ‘validation’ dataset. We will use this to ensure that the discriminator model can recognise a ‘real’ image. The ideal goal for the generator is for images it generates to be recognized as ‘fake’ by the discriminator with 50% accuracy, after performing well on ‘fake’ images generated using random Guassian vectors. This would indicate that the discriminator’s predictions are no better than guesses based on random chance for fake images created by the generator, even though it performs well on other ‘fake’ images. As per the literature, this is as good of a performance as the generator can achieve. The goal for the discriminator would then be to try and cross that 50% threshold for images generated by the generator. This is extremely interdependent and admittedly very challenging to achieve, but we will attempt to reach as close to this as possible.

Convert the output into an image and we can use qualitative feedback from architecture students, or students in general to test our model.

Base goal = Generate realistic looking images using the generator, have the discriminator be able to distinguish between ‘real’ images from the “Architectural Styles” dataset and random noise images generated by the generator

Target goal = Generate realistic looking images using the generator, have the discriminator be able to distinguish between ‘real’ images from the “Architectural Styles” dataset and realistic images generated by the generator with over 50% accuracy

Stretch goal = Generate realistic looking images using the generator, have the discriminator be able to distinguish between ‘real’ images from the “Architectural Styles” dataset and: realistic images generated by the generator with exactly 50% accuracy random noise images generated by the generator with over 90% accuracy

Ethics:

The broader societal issues that are relevant to this space surround impersonation, blackmail, misinformation, and a wide variety of implications that AI generated images are beginning to have on the world. We aim to understand them better and make more information about them publicly available, so that we as a society can adapt to this growing problem.

The major ‘stakeholders’ in this problem are people using fake image generators maliciously, and people who are deceived by these fake images. This includes social media users, consumers of other sources of media, and people personally victimized by this software, among others. The consequences of mistakes made by the algorithm are potentially quite dire, if it leads to someone believing a fake image is real or vice versa. As proof by way of images is becoming more and more difficult to verify as fact, accurately discriminating between what is ‘real’ and what is not accurately becomes crucial, especially when we are becoming more reliant on software to do it for us.

Division of labor: Briefly outline who will be responsible for which part(s) of the project.

Preprocessing: Keya
Making the GAN architecture (Loss functions, Training, etc): All together
Evaluating GAN:
Qualitative: Form sent out to users to determine whether an image is real or not. Question about what may look fake
Quantitative: Inception score
Poster: The collective