Creative Adversarial Networks for Art Generation

Creative Adversarial Networks: Generating Art in New Styles

Paper Link: https://arxiv.org/abs/1706.07068

Who: Marc Mapeke (mmapeke), Alana White (awhite35), Maceo Thompson (mthomp13), Carlos Perez-Ruiz (cperezru)

Introduction: Previously, GANs have been used to mimic existing styles of art by trying to generate images that match qualities of previous artwork. Elgammal et al’s Creative Adversarial Networks paper attempts to go beyond this and uses deep learning to create artwork unlike any previous artwork. The objective of this paper was to create images that would be classified as “art”, but do not match the style of any of the existing categories of art given to the network. We chose this paper because all of us are interested in visual computing, and this seems like a really interesting application of generative models that we haven’t gotten the chance to work with in other classes. This is an unsupervised learning problem.

Related Work: GANs have been appearing in more and more applications as research explores how GANs can be used to address the shortcomings of existing deep learning applications. The first fully realized GAN is attributed to Ian Goodfellow in 2014. Early research by Ian and others was guided by the goal of minimizing how noise in an image can sometimes cause a model to misclassify the image. Now, applications have extended to generating fake faces (Link), upscaling game quality in real time (Link), and audio generation (Link). A leader in this space has been NVIDIA, who is using GAN models for a number of research topics, and has already introduced a consumer ready GAN model: Deep Learning Super Sampling (DLSS). DLSS is a model that has been trained on existing images of game screenshots, textures, and everything in between to upscale the quality of the games as you are playing them. This is a powerful tool that can quietly make computers use less graphical computer (and therefore less power) when being used, with minimal visual quality loss.

Data: We plan to use the standard WikiArt dataset used in ArtGan (2017). In 2015, this dataset had about ~80,000 images from ~1,000 artists. No significant preprocessing should be needed. The dataset is quite large (~25 gb), so we may only use a subset of the dataset for our model. For example, we may train our model on only one style of art represented in the dataset. Link: https://github.com/cs-chan/ArtGAN/tree/f5d6f6b58a6d8a4bd05aaaedd9688d08c02df8f2/WikiArt%20Dataset

Methodology: The architecture of our model would be very similar to the standard architecture of a GAN with a generator and discriminator network. The main difference is that our generator will receive two signals from the discriminator instead of one like in a standard GAN. These signals will be 1) The discriminator's classification of whether the generated art is “art” or “is not art” and 2) The ability of the discriminator to classify the style of the generated art. The goal of the Creative Adversarial Network (CAN) will be to generate an output that is classified as “art”, but is difficult to classify as any one specific style of art. This architecture will also rely on convolutional layers. Because our model, like a standard GAN, is made up of two sub models, we will iteratively train the generator and discriminator to minimize their objective functions. The hardest part about implementing this model will be avoiding common failure modes that GANs can have, such as model collapse, instability, etc. This problem will be amplified in our model because the objective to generate art and the objective to generate style ambiguous art will contrast each other, adding another min-max style optimization problem to the inherent one present in GANs between the generator and discriminator.

Metrics: What experiments do you plan to run? An important experiment the paper highlights is training CAN with and without the style ambiguity loss. This allows them to demonstrate the difference between emulating the art distribution and generating “creative” samples from the distribution. For most of our assignments, we have looked at the accuracy of the model. Does the notion of “accuracy” apply for your project, or is some other metric more appropriate? Qualitative results are more relevant for our project, because we are generating art that deviates from the distribution of art represented in our dataset. The paper relies on qualitative results and the quantitative results from the paper rely on qualitative inspection by humans to evaluate features, such as creativity, preference, etc. If you are implementing an existing project, detail what the authors of that paper were hoping to find and how they quantified the results of their model. The main objective for the authors of the paper was to generate art with creative characteristics. The authors of the paper quantified the results of their model by conducting human experiments to evaluate different aspects of creativity in the generated art from the model. What are your base, target, and stretch goals?

Base: Generate low-resolution images that are style ambiguous between two classes

Target: Generate low-resolution images that are style ambiguous between greater than two classes (paper has about 26 classes, but the amount of data per class is nowhere near equal, so we can slowly scale the number of classes).

Stretch: Generate higher-resolution images. Condition the generated art to be “creative” but still represent a style/class of artwork.

Ethics: What broader societal issues are relevant to your chosen problem space?** There are two main ethical implications of this type of generative art - the first of which is that “success” is defined by “originality” and “creativity,” two inherently subjective things. One thing that was noted by the original authors of the paper is that often what the way the algorithm ends up defining uniqueness can be different from the way that humans do. It will be interesting to see how that contrast will affect how the actual output of the network differs from our expectations. The second ethical implication is less specifically relevant to the paper that we will be implementing, but is a larger consideration in the realm of computer generated images. This is that of realism, consent, and creative control. With new systems like OpenAi’s DALLE-2 that combine various deep learning techniques, work that passes for human can be readily available with minimal human input. It is incredibly important, as we continue to develop these type of systems, to be cognisant of the bias that is often inherent in the datasets used to train our models. Even in our project, our dataset is mainly centered around western european art.

Division of labor: Marc, Alana - Acquire Data, Architecture, Experiments Maceo, Carlos - Preprocessing, Experiments, Visualizations