Colourizing Greyscale Images

Inspiration

We really like the colour orange.

What it does

At the moment, the model turns greyscale images to a nice, saturated orange. Ideally, the model takes in a greyscale image as input and predicts the original colours of the image.

How we built it

We built the model using the Python library Keras! The script was entirely written in Google Colaboratory. The CNN architecture consisted of 8 convolutional layers with ReLU activation (encoder), followed by 3 upsampling layers and 5 deconvolution layers (decoder).

Challenges we ran into

Originally, we had decided to use Vision Transformers (ViT) as our ML model. Using a pre-trained model from HuggingFace, we had planned to adapt the ViTMAEModel for our purposes. This model performed a similar function, where pixels from masked patches on an image were reconstructed. Unfortunately, we realized that this model was too niche to be trained on our dataset. Although CNNs present the issue of bottlenecking, it remains to be a popular alternative for computer vision.

Currently, our model is not performing as expected. Some issues can be traced to how the output data is processed before printing, but it is largely because the model is not sufficiently trained. Google Colab presented several issues in GPU limits and runtime, forcing us to only train our model on 200 images at a time.

Prior to training from scratch, there were attempts to load pre-trained models from the caffe framework. However, it was not cooperative when loading into Google Colab.

Accomplishments that we're proud of

In our first attempt, the model returned the original greyscale images. We're proud to have produced any colour at all :')

What we learned

We learned how to train a CNN model from scratch (albeit, with poor performance). We also learned about the HuggingFace model repo and explored how ViTs work.

What's next for Colourizing Greyscale Images

Improved results can be achieved by tuning a pre-trained model on our dataset.

HuggingFace also provides the ViTModel, a more generalized Vision Transformer than ViTMAE. Once we have gained more familiarity with machine learning, we could return to this model.

Built With

cnns
colab
keras
python

Updates

Jennifer Tram Su started this project — Dec 01, 2022 12:56 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.