Who

aduchnow / Alex Duchnowski

hli129 / Hongyi Li

qyu10 / Qinan Yu

sanand13 / Sidharth Anand

Final Writeup

https://docs.google.com/document/d/1MzasHjM8NgmRWKzT1p_xJwbul66HcwniZLMnu8_8rTk/edit?usp=sharing

Reflection

https://docs.google.com/document/d/18NsgZuh1XyrBDy2cNTf1yrLLi8jFUeZ5gfcRybn5vgQ/edit?usp=sharing

Introduction

A clear and aesthetically pleasing logo is an important asset to any company, but crafting one often involves extensive collaboration with a designer. We sought to ease this process by creating a DCGAN model that generates logos of a specific type desired by the user. Logo data is inherently multi-modal, as text is often embedded within the image, but we pre-generate synthetic labels that enable our model to cope with this fact.

This problem is an unsupervised learning task, specifically one that uses a generative model.

Related Work

Logo Synthesis: https://arxiv.org/pdf/1712.04407.pdf

This paper uses both a DCGAN and a WGAN assisted by synthetic labels to generate and manipulate logos.

Generative Adversarial Text to Image Synthesis: https://arxiv.org/pdf/1605.05396.pdf

This paper develops a novel deep architecture for GANs that takes advantage of RNNs to learn discriminative text feature representations. The authors test their model by generating plausible images of birds and flowers from detailed text descriptions.

Implementation of DALLE2: https://github.com/lucidrains/DALLE2-pytorch

Conditional WGAN-GP: https://cameronfabbri.github.io/papers/conditionalWGAN.pdf

Data

We used the Large Logo Dataset (LLD) from Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks by Sage et al. 2017. The LLD-icon subset contains 486,377 favicons crawled from the top 1-million websites, all of uniform 32x32 pixel size.

Methodology

We use a ResNet-50 classifier, followed by a PCA dimensionality reduction and k-means clustering, to create 64 clusters from the dataset. We then pass the images from a single cluster, which can be freely chosen, to the DCGAN model. The generator uses transpose convolutions to produce an image from random noise, while the discriminator is a CNN-based image classifier.

Metrics

We plan to test the model by inspecting the output images, as well as by calculating inception scores, which measures whether the images have variety and each image distinctly looks like something, and Fréchet inception distance (FID).

Base Goal: The model can generate reasonable-looking logos.

Target Goal: The model can generate logos that reflect the cluster that the DCGAN was trained on.

Stretch Goal: The model can generate logos that include the names of the respective companies for which they were generated, or reflect their associated description.

Ethics

Why is Deep Learning a good approach to this problem? Deep Learning is a pretty good approach to this problem, as deep learning models can take multiple types of data and predict with complex and composite functions. In particular, GAN is very suitable for image generation, which is well-purposed for our project.

Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? Companies or users who want some suggestions on logo generation are the major “stakeholders” in this problem. It would be an issue if our model is biased toward logos from certain companies (i.e. those from Fortune 500) because this would impose homogeneity on the logos generated.

Division of Labor

Alex - do web scraping if necessary to generate partial dataset; research model architectures and tune hyperparameters

Hongyi - implement clustering scheme; integrate with DCGAN model to make it conditional

Qinan - preprocess company descriptions; train embeddings for all the words; integrate with DCGAN model to make it conditional

Sid - acquire dataset; build generator and discriminator models; implement inception score and FID metrics

Built With

Share this project:

Updates