Introduction: The paper titled Album Cover Generator from Genre Tags, creates unique images that reflect the characteristics of a song, based on genre labels. This paper was inspired by a desire to replicate the multi-sensual experience that having visual effects alongside music can have, and this same desire also inspired us to choose this paper to reimplement. Music and art are human creative outlets, and we want to remove the limitations that prevent musicians from having an album cover by creating a network that can easily do it.
In this project, the generative model aspect that uses a Generative Adversarial Networks is an unsupervised learning problem, as we are trying to generate images rather than predicting anything. This paper also includes a discriminator, which is a classification problem, as its job is to classify images based on genre tags.
Challenges: One challenge that we faced was understanding the concepts that were presented in the paper and used to create the model architecture. This was our first introduction to GANs, Deep Convolutional GANs, Auxiliary Classifier GANs, and multi-scale structural similarity, so we had to research and learn about these things in order to understand the paper. Another challenge was becoming familiar with PyTorch, which none of us had previous experience with.
Insights: We have already preprocessed the data from Napster’s API, which involved connecting to the API to get image urls, and preprocessing and normalizing the data. We have already built much of the model’s discriminator in PyTorch but we are still debugging it due to our unfamiliarity with the library.
Plan: So far, we are on track with our project but will need to dedicate more time to thoroughly understanding the research paper we are basing our project on. Though we have a high level understanding of the concepts and models used, we will have to gain a deeper understanding to implement the model correctly. This, along with just learning PyTorch, has become a priority for us to dedicate more time to, which will help us debug further. Also, we may change the categories that the model divides the album covers on, as the paper uses tags other than genres that are unique to the Spotify data set. In addition to genre we may also leave the option open to train data on years/decades that the album was produced in, to increase the variety of album outputs and to try something that the paper did not.
Log in or sign up for Devpost to join the conversation.