LabRAG

LabRAG UI

Inspiration

Understanding and adapting lab protocols to the available equipment is a highly challenging and time-consuming task that medical researchers face. Many common lab protocols found on the web are difficult to understand, incorrect, or offer no room for interpretation. Modern large language models can help with explaining and adjusting lab protocols, but a major pitfall of them is that they are prone to mistakes and hallucination. They lack easy access to relevant, reputable sources.. We came up with LabRAG to remedy this problem, which uses a Retrieval-Augmented Generation (RAG) pipeline to generate more reliable outputs and cite sources. Since RAG can select reputable reference information relevant to a user’s query and pass it into a large language model alongside the query, we speculated that implementing such an architecture would greatly reduce hallucination and simply result in a more organized regurgitation of the reference information. Additionally, the information passed into the large language model can be cited. Both these things, reliability and traceability to sources, are crucial for viability of large language models in medical research.

What it does

LabRAG is a reliable, retrieval-augmented chatbot designed to function as a virtual lab assistant. When a user inputs a keyword and a query, LabRAG identifies relevant reference articles based on the provided keyword. These articles are then segmented into chunks. Users can specify how many of these text chunks they want the system to consider. LabRAG then utilizes a Retrieval-Augmented Generation (RAG) pipeline to combine only the most relevant article chunks with the user’s query, providing contextually-informed input to the OpenAI ChatGPT API. This context makes the ultimate output much more accurate and reliable for lab-related inquiries. LabRAG is designed to accurately answer virtually any question a user asks about a particular lab protocol, including both general questions and specific context-based questions. The knowledge base behind the RAG wholly consists of university and accredited laboratory protocols. Additionally, our system provides detailed citations of the text chunks from the knowledge base used as reference, ensuring transparency, credibility, and responsible use of AI.

How we built it

We built LabRAG through various API’s and libraries to construct our libraries. Firstly, we constructed a semi-automated script that will, for a given list of protocols, go through a secure database (protocols.io) of university and accredited laboratories and fetch them into a knowledge base for our RAG. Then, we developed a script that takes all these protocols and turns them into a persistent vector store that can be used by the LLM, which is a database of text chunks that have been converted into a long list of vectors. Then, we use the same embedding function for the query and conduct a cosine similarity search to let the LLM know that, based on the query and keyword, it will grab the appropriate source from the knowledge base to feed to the LLM. Finally, we have the LLM prompting phase, which will provide an answer that the user sees. Additionally, there are other API’s used that helped with the citation of sources.

Challenges we ran into

While building LabRAG, we ran into numerous challenges, including the decision of what project to do and how it fits into the Healthcare/Biotech track, and maintaining some sort of way for it to fit within the Responsible AI theme of this hackathon. Another important challenge that we ran into was figuring out what model we wanted to select, for both embedding and the LLM. We were initially thinking about using Gemini, but decided to switch to OpenAI’s tools for simplicity. Additionally, matching articles to keywords reliably was quite challenging. Lots of work went into parsing the text of articles to find their keywords.

Accomplishments that we're proud of

It was our first hackathon and first time developing an AI project outside of coursework. For most of us, there was a steep learning curve in learning to work with APIs and developing a project from scratch. We’re proud of executing a thoughtful idea which can have a real impact in improving the efficiency and quality of medical research.

What we learned

Anshul Aravind - Learned how to build a basic RAG architecture and understand the fundamentals of the benefits of using a RAG + an LLM. Also increased my full-stack coding ability and learned how to spearhead a project / complete a hackathon project from scratch for the first time. Howard Shu - Learned how to implement frontend features such as a dropdown menu, efficiently search for keywords in files, and the fundamentals of RAG. Yichen Zou - Learned how to implement the frontend and how to use React and Node.js for the first time. Atul Thirumalai - Learned how to use an API and how using a RAG + an LLM is really useful. Really improved my ability to code front end websites/UIs and implement all this into my first ever hackathon project!

What's next for LabRAG

Our project’s next steps are the following: Have a user authenticate into the system and keep a database that stores metrics such as their most used queries, most used protocols, other recommended protocols based on past searches. Have a text-to-SQL database that can allow people to read information from datatables, and allow for image embedding so that people can look up a specific protocol by processing an image Develop better automation technology