Pytorch Docs RAG

Inspiration

I use github copilot all the time, but often it's output uses deprecated code.

What it does

My app is a simple RAG that retrieves information from the pytorch docs from the version of pytorch you want to use. This way your suggested code will not be deprecated.

How we built it

I built it with Beautiful Soup for web scraping, Milvus for the Vector DB, and Pytorch and Hugging Face to locally host my embedding and autoregressive models for inference. I used the maidalun1020/bce-embedding-base_v1 model for my embeddings, and Llama-3 instruct as my generative LLM. I made a simple web UI with streamlit and host the application on an AWS instance.

Challenges we ran into

AWS never provisioned me a GPU, so the app runs entirely from a CPU node. This is not ideal as the forward pass on the generative LLM is very slow, but I made it work.

Accomplishments that we're proud of

I built a RAG app from start to finish by myself in only a few hours, all open source and locally hosted. The code is also highly modular, replacing the models would be trivial, and the app could be scaled to a medium scale easily.

What we learned

The actual quality control for generated text is difficult. In some cases the model is too smart and will not use retrieved text properly. Other times the model will use the retrieved text even when it is not helpful for answering the question. There needs to be another instance of the generative LLM that you can ask to see if the answer is better with or without the retrieved text. Making a RAG app is easy, making a good RAG app is hard.

What's next for Pytorch Docs RAG

Making it good, embedding more versions of the docs (separately), embedding blog posts that correpsond to the correct torch version.

Built With

huggingface
milvus
pytorch
streamlit

Updates

Tomas Matteson started this project — May 12, 2024 02:33 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.