What it does

Starting to work in a new machine learning framework or package can be confusing. Instead of wading into the documentation from the beginning, it can be helpful to find a familiar use case. The documentation for sklearn, caret, numpy, and scipy were used to train a doc2vec model implemented in pytorch, giving each module of the documentation a dense embedding vector representation. Pairwise similarities between vectors were computed and ranked, giving the closest match to each section of the documentation. Below is a t-SNE low dimensional representation of the embedding vectors and a lookup table for the most similar documentation sections for the one selected.

How I built it

The doc2vec model was built in PyTorch using torchtext. Additionally worked with a model using the transformers library, using a pretrained BERT model to generate sentence embeddings. The web app is built using dash and hosted on pythonanywhere.

Challenges I ran into

The project isn't quite finished - time was the biggest constraint on completion. I entered this hackathon with some basic PyTorch skills and the effort I put in has made me much more comfortable with the framework. I look forward to continuing to play with it.

What I learned

I became a comfortable PyTorch user, learned to play with pre-trained transformers in the framework, and became better at turning papers into code.

Built With

Share this project: