Who are we
Kaki Su Kota Soda March Boonyapaluk Miku Suga
Inspiration
See link for where we got our inspiration.
Poster
See link for poster.
Presentation
See link for Presentation.
Final Write Up
See link for writeup.
Check in 1
https://docs.google.com/document/d/1u2YLN_17fEUXl26gTHrPXPtTsDEl2xRnMCR97Nbkf6I/edit?ts=5fad464c
Check in 2
https://docs.google.com/document/d/19IL0Z4sRWOH6gD1SET2AE1U7PXX79BcKIUEPoby7f9g/edit?usp=sharing
What it does
This project focuses on a particular method proposed by Lu, et al. to color black and white images using a colored reference image. The model encodes the semantic structure of the original image, as well as the coloring distribution of the reference image, and applies the colorization to the black and white image. We were motivated to pursue the implementation of this model from viewing a neural net-powered colorization of old monochrome images from the early 1900s.
How I built it
Dataset
We used the Places 365 scene dataset which we imported from the tensorflow dataset library. In order to make the images usable by the model, we need to separate the dataset into a target image set (the black and white images we color) and the reference image set (the colored images we refer to in order to color the target images).
Architecture
The model architecture is split into two main sub networks (Figure 2). GCFN (Gated color fusion sub-network) fuses the semantic and color distribution information in the reference image, and MCN uses transpose convolution in order to color the monochrome image using information gathered from the GCFN. We also run a discriminator model to calculate the losses of the model.
Challenges I ran into
Since the paper that we referenced was relatively new (published August 2020), it took us some time to learn the structures and models that they used and adapt it for our own purposes. The tach-based discriminator portion was especially challenging; the paper referenced other sources for more information, but we were never able to locate it. A lot of the information regarding the loss function was also ambiguous in the paper, so our calculations surrounding that required a lot of trial and error. The loss function is a large source of error, since it is a summation of five different loss functions, most of which we had to guess or implement ourselves. In addition, using Places365 data set from Tensorflow required us to download everything before choosing the training data sizes, so we had to download 100GB worth of data locally, which some of our computers and set up didn’t allow. We also tried GCP, where ultimately we had to create three extra VM instances to support 100+GB dataset. We also had some technical difficulties with setting up our local environments, which resulted in some time loss.
Accomplishments that I'm proud of
As aforementioned, although the paper was ambiguous on a lot of details, such as the losses and GAN architecture, we were able to closely replicate the model to the point where it was running and producing some output. This experience was particularly rewarding, given that we had to write most of our loss/discriminator/preprocessing functions from scratch.
Ethics
Why is Deep Learning a good approach to this problem? • When it comes to coloring black and white images, manual colorization can require a lot of work and effort in order to make images “look right.” In addition, most professional colorization would require the help of a graphic designer or artist. By applying this deep learning model, anybody trying to color a black and white image can do so easily, just by passing in the black and white image, as well as a similar colored image with a similar color palate. What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? • We are leaning towards using the Places365 dataset from the tensorflow library. An obvious consequence of this is because the Places365 dataset is such a general and non-specific dataset, there will be images that don’t resemble anything close to what the model has seen before. If this happens, the model could incorrectly color the image (in a semantic sense). In addition, the data contains 1.8million images, which means we cannot look through every single image in the dataset. Therefore, we may not spot inherent trends in the dataset (ex. Most images originating in or depicting Western countries/cultures). This would cause the model to tend to color things (semantically) in a Western way. The good thing about this model is that since it selects the color palate based on the reference image, it will be unlikely to be biased in the way it chooses which colors to apply.
What's next for Image COLORing
A further extension of this project would be to apply it to video coloring, which was where we originally gained inspiration from to work on this project. We would create a pipeline that allows a user to input any video and reference image, and output a colorized video based on the reference image that was passed in. This extension would require a large amount of preprocessing functions, as well as the need to optimize computations in order to minimize the amount of time it takes to process an image.
Log in or sign up for Devpost to join the conversation.