Image Colorization

Deep Learning Final Project Proposal

Title: Image Colorization

Who: Mazine Suliman (msulima2), Sabrina Chwalek (schwalek), Nathan Mugerwa (jmugerwa)

Introduction

We plan on implementing Richard Zhang’s research paper investigating the generation of colored grayscale images. The paper aims to colorize black and white photographs realistically. Zhang decided to approach the problem of image colorization by using Convolutional Neural Networks to “hallucinate” what an input grayscale image would look like when colorized. We thought this paper would be a fun topic since it can be applied to image restoration for photographs taken before color photography was invented. (see here!) This project is an implementation of supervised learning, specifically classification. Our labels would be the colored version of the photographs.

Related Work

There has been a decent amount of research into auto-colorization. A newer technique employs a GAN to colorize images. The approach involves a generator colorizing the image and a discriminator learning how to detect good colorizations.

There’s also another breakdown of Zhang’s paper using another framework (openCV)

We also found a very simplistic ‘alpha’ colorization bot that’s only 40 lines of code!

We haven’t found any implementations of the paper, but this website contains a database of different image-coloring frameworks, several of which are likely similar to Zhang’s.

Data

We’ll be using ImageNet, a widely-used dataset of millions of images. This dataset was created in 2006 to aid with AI research. It was built using crowd-sourcing; they asked volunteers and hired paid participants to annotate images.

There will be a relatively simple pre-processing step. First, we’ll want to do some of the normalization steps that we did in our CNN assignment; e.g. normalizing pixel values to be from [0, 1]. Next, we’ll need to extract a train and test set. We can do this by treating colorful images from ImageNet as labels and de-colored (greyscale) versions of those images as inputs.

The quantity of data available may become a bottleneck for our training performance (e.g. it may take too long to train on even ¼ of the available data).

Methodology

We’ll be re-implementing this paper.

We’ll be using a feed-forward CNN architecture with ~8 CNN downsampling and upsampling layers. Its output layer will be the probabilities of each pixel value being a certain intensity, for each pixel. The max of that layer will be our prediction for that pixel, allowing us to create a prediction for each pixel of what its intensity should be in the colorized version of the image. The final output is a colorized mask that we concatenate with our input (greyscale) image, creating a colorized version of the original image.

The most difficult part will most likely be the construction of the output layer and choice of loss function, since performance is greatly affected by how you pose loss. The best choice seems to be multinomial cross-entropy loss. We’ll also use a weighting term to balance the loss based on class-color rarity.

Metrics

Base Goal: At minimum, we should be able to implement the model and have it roughly colorize our images. This can be assessed visually/manually.

Target Goal: Our target goal is to successfully implement our model and fool people 10% of the time with our generated images. We’d also like to create nice visualizations of images before and after colorization.

Stretch Goal: Possibly extend this paper to deal with more historical photographs that were created prior to the invention of color photography (Here’s one idea of a dataset containing US national parks in 1941). One also possible extension is working with paintings an re-coloring them (dataset)

Ethics

How are you planning to quantify or measure error or success? What implications does your quantification have? Similar to Zhang et. al 2016, we plan on using an “Colorization Turing Test” to evaluate our models' success. The test measures if we’re able to successfully fool people when they’re asked to choose between a generated and ground truth color image. Zhang et al. were able to fool people 32% of the time and we hope to be able to fool people at least 10% of the time. Our quantification therefore relies on subjective judgements of whether or not an already colored image versus an artificially colored image produced by our model is better.

Why is Deep Learning a good approach to this problem? Colorizing images used to only be possible by-hand, through coloring a physical picture or using photo-editing software. This is labor intensive. A Deep Learning approach allows us to build a model then leverage that model to very quickly color new images, saving tons of effort.

Division of Labor

Preprocessing (Nate)

Model Construction (everyone)

Data Analysis (Mazine)

Poster (Sabrina)