Image Colorization

Final Writeup: https://docs.google.com/document/d/1YYBO_dxzi9H8zOZ97is-OcrT5Qyr0Ct5nPuJFRZ-Cqs/edit?usp=sharing

Introduction

The colorization of grayscale images is an ill-posed problem, with multiple correct solutions. In this project, we propose an adversarial learning colorization approach coupled with semantic information. A generative network is used to infer the chromaticity of a given grayscale image conditioned to semantic clues. This network is framed in an adversarial model that learns to colorize by incorporating perceptual and semantic understanding of color and class distributions. The model is trained via a fully self-supervised strategy. Qualitative and quantitative results show the capacity of the proposed method to colorize images in a realistic way achieving state-of-the-art results.

Related Work

For the article “Colorful Image Colorization”; given a grayscale photograph as input, the program attacks the problem of hallucinating a plausible color version of the photograph. The system is implemented as a feed-forward pass in a CNN at test time and is trained on over a million color images. They use “colorization Turing test,” for evaluation https://github.com/richzhang/colorization

Data

We will be using the ILSVRC2012 dataset, from Kaggle. https://www.kaggle.com/c/imagenet-object-localization-challenge/overview/description

Methodology

8 blocks of 3 convolution, 1 relu, and 1 batch normalization layers. From the paper, “The net has no pool layers. All changes in resolution are achieved through spatial downsampling or upsampling between conv blocks.” We will use Google Cloud to train our model faster than locally. The hardest part about implementing the paper will be converting from Pytorch to Tensorflow and the long training times that are associated with multiple convolution layers.

Metrics

Since image colorization has multiple acceptable answers, accuracy is not a metric that we can use to evaluate our model. We will use human participants to choose between a generated and ground truth color image. Our base goal will be to fool humans on 5% of the trials, our target will be 10%, and our stretch will be 15%. Additionally, the PSNR of the images will be computed with respect to the ground truth and compared to those obtained for other fully automatic method.

Ethics

Colorizing a grayscale image requires a deep understanding of how colors and textures are associated with different objects and features in the image. Deep learning models such as CNNs can learn the complex relationships by automatically extracting features from the input image at different scales and combining them to generate a color image.

If the dataset disproportionately represents a certain racial group or demographic and portrays images of certain scenarios with underlying bias, the model will not perform accurately when presented with grayscale images of underrepresented groups. For example, if the dataset consists of faces of one racial group, the model may not perform well on faces of other racial groups.

We are concerned that some images were collected illegally or without permission. Since there are millions of images in the database, those illegal images are hard to identify.

Division of labor

Our main division of labor will be the following: Preprocessing: Ben, Yaoqi Implementation: Ben, Seyit
Evaluation: Seyit, Zahra Write up and Poster: Yaoqi, Zahra Additionally we will all work together when necessary.

Built With

tensorflow

Updates

Ben Duong started this project — Apr 16, 2023 12:48 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.