Implementation of CycleGAN on Images and Videos

fake_horse
fake zebra
fake monet
fake photo
fake apple
fake orange
Poster

Implementation of CycleGAN on Images and Videos

Yiqin Yuan, Jiaxin Liu

Introduction: The paper we are going to implement is "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". This is an unsupervised learning problem.

Related Work: A paper that has a similar topic and beyond the paper we are implementing is "U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation". This paper involves a special attention technique, which is not covered by csci1470, called Class Activation Map (CAM), and it proposes a normalization AdaLIN.

Data: The data we are going to use is "horse2zebra". There are many other similar datasets that are available provided by the CycleGAN paper. It probably would take over one week to train on all of them since the computation source is limited to our team, but we would definitely test our model on more datasets if we have enough time.

Methodology: Due to the limitation of computation resources, we will train the model with images of the size 28*28 instead of 256*256, and the batch size should be 1. Since we are implementing an existing paper, the hardest part is we have to implement the cycle structure of the model that converts the style of the images between two domains. Besides, since GANs are not stable, the long training time and instability are also challenges for us.

Metrics: We will conduct experiments with dataset "horse2zebra", and we will do more experiments if we have enough time on datasets such as "apple2orange", "Monet2photo", etc. To evaluate our model, since what we do is image style transfer, the most straightforward method to evaluate is to see the result by our eyes, it is obvious whether a horse is transferred to a zebra, for instance. Our base is to build the model and make the transfer function works well, and our target is to improve the local accuracy and the overall rationality of the transfer within the limitation of our computation resources. As for the stretch goals, we will try to implement the model on videos, or improve the model by the methods used in the paper "U-GAT-IT", which is mentioned above, and we could even try to add the Sparse technique which is used in "Sparse Transformer" to the model and see how much would the accuracy improve.

Ethics: Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? The initial idea to build this kind of model is to transfer the style of images, such as transferring photos in an artistic way, change the style or feature of human face. The face changing technique has been researched by many researchers for many years, and sometimes we cannot say that the faces that the model generate look like human is just correct, if something happens just like the model generates a face coincidently generates a face of a real person, it can be used by people with bad intention and may cause trouble in real world.

Why is Deep Learning a good approach to this problem? It can be really hard, for instance, if I want to transfer a million horse images to zebra images by hand-crafted techniques like using photoshop. If we train a CycleGAN model to conduct this operation, then it can be done within just a few days.

Division of Labor: Yiqin Yuan is responsible for the preprocessing and building of the networks to train the model. Jiaxin Liu is responsible for building the generators which conduct the task of the transfer and the discriminator which judge whether the images are generated or real.

Built With

Updates

Jiaxin Liu posted an update — Nov 27, 2021 06:32 PM EST

Progress Report

Introduction: The paper we are going to implement is "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". This is an unsupervised learning problem.

Challenges: First of all, it takes really long time to train the model. Every single mistake can lead to loss of 10 hours.

Besides, there is big difference between model of horse2zebra and monet2photo since we have to reserve similar color while transferring the style with CNN. And the solution is to add identity loss, which computes the distance from original painting to the fake painting, to loss function. This is because we desire a same picture after passing monet painting into model of monet2photo and photo2monet. Now our model is perfectly working without making a person look like ghost :)

Insights: Concrete result: Horse to zebra and zebra to horse:

Apple to orange and orange to apple:

Monet to photo and photo to monet:

Plan: In the following days, we’d like to train models of Vangogh, ukiyoe, manga, cezanne2photo so we can compare them together to make our project result more reliable. After that, we will implement our models on videos to transfer its style.

Log in or sign up for Devpost to join the conversation.

Yiqin Yuan started this project — Nov 14, 2021 06:00 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.