Implementation of CycleGAN on Images and Videos
Yiqin Yuan, Jiaxin Liu
Introduction: The paper we are going to implement is "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". This is an unsupervised learning problem.
Related Work: A paper that has a similar topic and beyond the paper we are implementing is "U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation". This paper involves a special attention technique, which is not covered by csci1470, called Class Activation Map (CAM), and it proposes a normalization AdaLIN.
Data: The data we are going to use is "horse2zebra". There are many other similar datasets that are available provided by the CycleGAN paper. It probably would take over one week to train on all of them since the computation source is limited to our team, but we would definitely test our model on more datasets if we have enough time.
Methodology: Due to the limitation of computation resources, we will train the model with images of the size 28*28 instead of 256*256, and the batch size should be 1. Since we are implementing an existing paper, the hardest part is we have to implement the cycle structure of the model that converts the style of the images between two domains. Besides, since GANs are not stable, the long training time and instability are also challenges for us.
Metrics: We will conduct experiments with dataset "horse2zebra", and we will do more experiments if we have enough time on datasets such as "apple2orange", "Monet2photo", etc. To evaluate our model, since what we do is image style transfer, the most straightforward method to evaluate is to see the result by our eyes, it is obvious whether a horse is transferred to a zebra, for instance. Our base is to build the model and make the transfer function works well, and our target is to improve the local accuracy and the overall rationality of the transfer within the limitation of our computation resources. As for the stretch goals, we will try to implement the model on videos, or improve the model by the methods used in the paper "U-GAT-IT", which is mentioned above, and we could even try to add the Sparse technique which is used in "Sparse Transformer" to the model and see how much would the accuracy improve.
Ethics: Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? The initial idea to build this kind of model is to transfer the style of images, such as transferring photos in an artistic way, change the style or feature of human face. The face changing technique has been researched by many researchers for many years, and sometimes we cannot say that the faces that the model generate look like human is just correct, if something happens just like the model generates a face coincidently generates a face of a real person, it can be used by people with bad intention and may cause trouble in real world.
Why is Deep Learning a good approach to this problem? It can be really hard, for instance, if I want to transfer a million horse images to zebra images by hand-crafted techniques like using photoshop. If we train a CycleGAN model to conduct this operation, then it can be done within just a few days.
Division of Labor: Yiqin Yuan is responsible for the preprocessing and building of the networks to train the model. Jiaxin Liu is responsible for building the generators which conduct the task of the transfer and the discriminator which judge whether the images are generated or real.
Log in or sign up for Devpost to join the conversation.