Who are we

Kaki Su Kota Soda March Boonyapaluk Miku Suga

Inspiration

See link for where we got our inspiration.

Poster

See link for poster.

Presentation

See link for Presentation.

Final Write Up

See link for writeup.

Check in 1

https://docs.google.com/document/d/1u2YLN_17fEUXl26gTHrPXPtTsDEl2xRnMCR97Nbkf6I/edit?ts=5fad464c

Check in 2

https://docs.google.com/document/d/19IL0Z4sRWOH6gD1SET2AE1U7PXX79BcKIUEPoby7f9g/edit?usp=sharing

What it does

This project focuses on a particular method proposed by Lu, et al. to color black and white images using a colored reference image. The model encodes the semantic structure of the original image, as well as the coloring distribution of the reference image, and applies the colorization to the black and white image. We were motivated to pursue the implementation of this model from viewing a neural net-powered colorization of old monochrome images from the early 1900s.

How I built it

Dataset

We used the Places 365 scene dataset which we imported from the tensorflow dataset library. In order to make the images usable by the model, we need to separate the dataset into a target image set (the black and white images we color) and the reference image set (the colored images we refer to in order to color the target images).

Architecture

The model architecture is split into two main sub networks (Figure 2). GCFN (Gated color fusion sub-network) fuses the semantic and color distribution information in the reference image, and MCN uses transpose convolution in order to color the monochrome image using information gathered from the GCFN. We also run a discriminator model to calculate the losses of the model.

Challenges I ran into

Since the paper that we referenced was relatively new (published August 2020), it took us some time to learn the structures and models that they used and adapt it for our own purposes. The tach-based discriminator portion was especially challenging; the paper referenced other sources for more information, but we were never able to locate it. A lot of the information regarding the loss function was also ambiguous in the paper, so our calculations surrounding that required a lot of trial and error. The loss function is a large source of error, since it is a summation of five different loss functions, most of which we had to guess or implement ourselves. In addition, using Places365 data set from Tensorflow required us to download everything before choosing the training data sizes, so we had to download 100GB worth of data locally, which some of our computers and set up didn’t allow. We also tried GCP, where ultimately we had to create three extra VM instances to support 100+GB dataset. We also had some technical difficulties with setting up our local environments, which resulted in some time loss.

Accomplishments that I'm proud of

As aforementioned, although the paper was ambiguous on a lot of details, such as the losses and GAN architecture, we were able to closely replicate the model to the point where it was running and producing some output. This experience was particularly rewarding, given that we had to write most of our loss/discriminator/preprocessing functions from scratch.

Ethics

Why is Deep Learning a good approach to this problem? • When it comes to coloring black and white images, manual colorization can require a lot of work and effort in order to make images “look right.” In addition, most professional colorization would require the help of a graphic designer or artist. By applying this deep learning model, anybody trying to color a black and white image can do so easily, just by passing in the black and white image, as well as a similar colored image with a similar color palate. What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? • We are leaning towards using the Places365 dataset from the tensorflow library. An obvious consequence of this is because the Places365 dataset is such a general and non-specific dataset, there will be images that don’t resemble anything close to what the model has seen before. If this happens, the model could incorrectly color the image (in a semantic sense). In addition, the data contains 1.8million images, which means we cannot look through every single image in the dataset. Therefore, we may not spot inherent trends in the dataset (ex. Most images originating in or depicting Western countries/cultures). This would cause the model to tend to color things (semantically) in a Western way. The good thing about this model is that since it selects the color palate based on the reference image, it will be unlikely to be biased in the way it chooses which colors to apply.

What's next for Image COLORing

A further extension of this project would be to apply it to video coloring, which was where we originally gained inspiration from to work on this project. We would create a pipeline that allows a user to input any video and reference image, and output a colorized video based on the reference image that was passed in. This extension would require a large amount of preprocessing functions, as well as the need to optimize computations in order to minimize the amount of time it takes to process an image.

Built With

Share this project:

Updates