Principal Component Analysis on image samples with Pytorch

'Have spoken in Tamil in the video, However non Tamil speakers can follow along the tutorial too!'

Inspiration

Principal component Analysis aids towards meaningful inference of important features of data samples, especially when the given data set of samples is drawn from a homogenous population. Here we use PCA to project data into a different dimension and later project it back to its original dimension.

Prerequisites

Familiarity with topics in Linear Algebra such as matrix multiplication is a must, good to know concepts such as Singular value decomposition. Familiarity with Python will be needed.

What it does

In this tutorial we use the Pytorchs efficient PCA implementation on images for retaining essential features of an Image. After PCA, we will construct the original image back, Yes there will be some data loss when we reconstruct. We will visualize the newly constructed image after varying the number of principal components as a parameter.

How I built it

Most of my ideas are from Andrew NGs course on Machine Learning available at link have also used the Pytorch documentation on link

Challenges I ran into

Matrix multiplication takes a lot of time when the dimensions are large. Initially I Tried the approach of computing the variance covariance matrix, but the computation took too long due to the high dimension. Pytorchs low rank PCA function does the task well. Normalization is not required for this task. I do not have a GPU and I felt that even Pytorch's( version '1.5.0') Matmult function execution on my CPU is slow when we perform multiplication between large dimensional matrices, however they are more efficient than the numpy version '1.17.2' in this regard.

Accomplishments that I'm proud of

Fact that am able to reconstruct the original image given the different dimensional projection obtained from PCA.

What I learned

Saw an example where the PCA_lowrank function indeed works on low rank matrices. Got to see How the effect of the number of chosen Principal components changes the effect of the reproduced image quality. Also about the fact that Pytorch( version '1.5.0') is way more efficient than using numpy(version '1.17.2') for matrix operations.

What's next for Principal Component Analysis | Pytorch

Would like to create an image standardizer that will take in a passport size photo and crop it to a specific shape, so many people can utilize this code - irrespective of the input size they have, bit of Computer Vision would be required here. Also would love to integrate this code with a streamlit module that implement a slider for the number of principal components, Along with this, a function that caches in the image reconstruction process from the output of PCA for the various input ranges of the slider values. I think that would be cool and it would give an idea between the relation of the image reconstruction quality with the number of principal components retained.

Built With

Updates

Vivek Veerabahu Subramanian started this project — Oct 25, 2020 07:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.