I was inspired to work on this project from watching a video released by NVIDIA around June 2020 on "Synthesizing High-Resolution Images with StyleGAN2" (link). I found it incredibly fascinating how a computer could learn to generate an image from nothing by reading patterns from just a dataset of images. As someone who has always been interested in the latest technological advancements, it was the first time in several years where I was genuinely surprised by what a computer could do.
What it does
My project seeks to take an elementary approach to Image Generation by using the readily available CelebA dataset of celebrity faces. The GAN model is comprised of a basic discriminator which can tell the difference between real and fake images and a generator which seeks to keep improving and generate more realistic photos to trick the discriminator.
How I built it
The entire project was written in Python, using industry-standard machine learning frameworks such as Keras and Tensorflow. In addition, I was able to download the dataset from Kaggle. Preprocessing of those images were also done in Python using libraries such as PIL, Pandas and Numpy.
Challenges I ran into
The overall project, even though it is only at an elementary level, turned out to be much more complicated than I had thought. It was difficult to figure out how to use machine learning frameworks such as Keras to build a GAN architecture which was effective. In addition, the dataset had to be greatly reduced from my original intentions because processing over 200,000 160x160 RGB images were too slow for my computer.
Lastly, I found training the GAN model stably was the hardest part. Often times, the GAN model would seem to work normally, but quickly go off-track, generating completely unrecognizable noise after many epochs. The root of the problem was mainly due to the model architecture as well as untuned hyperparameters.
Accomplishments that I'm proud of
I am very proud to have been able to complete the project in time. About 2/3 into the semester, I got completely knocked out by the flu for about 2 weeks. By pulling a lot of all-nighters in compensation, I was thankfully able to finish the project in time.
As for the project itself, the generated photos turned out to be better than I thought! I'm proud of what my model can generate, although I'm more surprised at how an individual with a single computer could achieve such results.
What I learned
Throughout the progression of my project, I learned a lot about both the details of machine learning as well as the whole process of machine learning / software development.
At a detailed level, I was able to develop an understanding on Keras / Tensorflow and also how GAN architectures should be constructed for optimal results.
At a broader level, I was able to walk through the process of creating a project from beginning to end. There were a lot of unexpected errors/problems encountered from syntax to model design which I would have had no idea about had I not taken this course.
Overall, I am very thankful to the McGill Artificial Intelligence Society for providing me with the opportunity to work on such a project!
What's next for FaceGen - An Elementary Project in Image Generation
My original intentions for this project were grand, as I did not expect to be ill for 2 weeks. I had intended to create a program where the user would have access to sliders, each attached to a facial feature (eye distance, nose size, etc), that could generate a portrait based on those characteristics.
In addition, I would like to first improve the model to be able to generate hyper-realistic images and then apply it to various other datasets, such as photos of myself. Since 2020 was the year of the American election, one fun idea I had was to assemble a dataset of previous US presidents to try to predict what the next president's face would look like!