Who
aduchnow / Alex Duchnowski
hli129 / Hongyi Li
qyu10 / Qinan Yu
sanand13 / Sidharth Anand
Final Writeup
https://docs.google.com/document/d/1MzasHjM8NgmRWKzT1p_xJwbul66HcwniZLMnu8_8rTk/edit?usp=sharing
Reflection
https://docs.google.com/document/d/18NsgZuh1XyrBDy2cNTf1yrLLi8jFUeZ5gfcRybn5vgQ/edit?usp=sharing
Introduction
A clear and aesthetically pleasing logo is an important asset to any company, but crafting one often involves extensive collaboration with a designer. We sought to ease this process by creating a DCGAN model that generates logos of a specific type desired by the user. Logo data is inherently multi-modal, as text is often embedded within the image, but we pre-generate synthetic labels that enable our model to cope with this fact.
This problem is an unsupervised learning task, specifically one that uses a generative model.
Related Work
Logo Synthesis: https://arxiv.org/pdf/1712.04407.pdf
This paper uses both a DCGAN and a WGAN assisted by synthetic labels to generate and manipulate logos.
Generative Adversarial Text to Image Synthesis: https://arxiv.org/pdf/1605.05396.pdf
This paper develops a novel deep architecture for GANs that takes advantage of RNNs to learn discriminative text feature representations. The authors test their model by generating plausible images of birds and flowers from detailed text descriptions.
Implementation of DALLE2: https://github.com/lucidrains/DALLE2-pytorch
Conditional WGAN-GP: https://cameronfabbri.github.io/papers/conditionalWGAN.pdf
Data
We used the Large Logo Dataset (LLD) from Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks by Sage et al. 2017. The LLD-icon subset contains 486,377 favicons crawled from the top 1-million websites, all of uniform 32x32 pixel size.
Methodology
We use a ResNet-50 classifier, followed by a PCA dimensionality reduction and k-means clustering, to create 64 clusters from the dataset. We then pass the images from a single cluster, which can be freely chosen, to the DCGAN model. The generator uses transpose convolutions to produce an image from random noise, while the discriminator is a CNN-based image classifier.
Metrics
We plan to test the model by inspecting the output images, as well as by calculating inception scores, which measures whether the images have variety and each image distinctly looks like something, and Fréchet inception distance (FID).
Base Goal: The model can generate reasonable-looking logos.
Target Goal: The model can generate logos that reflect the cluster that the DCGAN was trained on.
Stretch Goal: The model can generate logos that include the names of the respective companies for which they were generated, or reflect their associated description.
Ethics
Why is Deep Learning a good approach to this problem? Deep Learning is a pretty good approach to this problem, as deep learning models can take multiple types of data and predict with complex and composite functions. In particular, GAN is very suitable for image generation, which is well-purposed for our project.
Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? Companies or users who want some suggestions on logo generation are the major “stakeholders” in this problem. It would be an issue if our model is biased toward logos from certain companies (i.e. those from Fortune 500) because this would impose homogeneity on the logos generated.
Division of Labor
Alex - do web scraping if necessary to generate partial dataset; research model architectures and tune hyperparameters
Hongyi - implement clustering scheme; integrate with DCGAN model to make it conditional
Qinan - preprocess company descriptions; train embeddings for all the words; integrate with DCGAN model to make it conditional
Sid - acquire dataset; build generator and discriminator models; implement inception score and FID metrics
Built With
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.