MediGraph: Biomedical Research Assistant

MediGraph Cover Image

Inspiration

We jumped into the challenge due to the word "GraphRAG" being emphasized in the challenge description, and as software engineers we were intrigued to know more about this new algorithmic approach to enhance LLM response.

But along the way we discovered this wonderful graph database called Hetionet, a knowledge graph combining millions of pieces of medical literature all at one place! Even though we were not from medical background, it didn't take us time to realize the potential drug discovery use cases this graph can provide if supported by a tool that makes it easier to mine this knowledge.

We talked about this potential helper tool with our friends who are medical doctors in different domains and do hard core research on a daily basis. They validated our assumption that it can help accelerate their research efforts. This insight combined with their utter excitement for the tool gave us all the motivation and inspiration to do this project!

What it does

This software acts as a Biomedical Research assistant agent that helps researchers and other relevant users uncover correlations between several types of genes, drugs, diseases etc, which helps to answer questions like - "Can we repurpose the drug used for treating Disease A for Disease B?".

The agent will try to get the relationship between the given drug and disease, and then search the knowledge graph to understand if a potential repurposing can be done for Disease B or not. The agent might perform reasoning such as - "Let me find something that is common between Disease A and Disease B. Oh, I found a particular Gene C to be linked with both these disease. Hmm.. maybe drugs associated with Gene C can be potential candidates for treatment for Disease C..."

Given the breadth and depth of the Hetionet dataset, the AI agent is not restricted to solely the drug repurposing use case. It can be explored to find other insights as well. We tested some with whatever knowledge we had, but more insights on this can only be obtained when we will perform beta testing with a few hundred users in the future. For the duration of the hackathon, we didn't have the bandwidth to do this testing.

How we built it

We used Jupyter Notebook to build both the backend and frontend.

For backend -

We used ArangoDB as the database for storage of the Hetionet Graph.
We used Gemini API for LLM inference on the graph via GraphRAG. We built on top of the starter template provided by the Arango team as is described in the video.

For frontend we used Streamlit via the tunnel python package in the colab notebook environment. More details on the implementation is provided in the video

Challenges we ran into

The Hetionet dataset is huge, and we faced trouble in uploading the entire graph to ArangoDB. The default upload operations only uploaded a part of the graph. Even though we tried various methods to upload the entire graph to ArangoDB, we couldn't do it. We are not sure whether this is a trial period limitation or a bug or capacity constraints, but in the interest of time we abandoned finding a solution for this and focused on the partial dataset that got successfully uploaded.

Accomplishments that we're proud of

We created a hardcore biomedical research assistant! How often do you get the chance to hear something like in this in a pool of AI apps that most often just generate a cool image or textual content? That's what we are proud of!

What we learned

Streamlit can be used inside Collab notebook via tunnel! Never knew this!
Langchain agent creation via function chaining.
We learned how to use GraphRAG to integrate knowledge graphs into an LLM's response.

What's next for Biomedical Research Assistant

We want to -

Make a better and full fledged Web and Mobile UI
Put the option of web search if Hetionet dataset is not able to answer everything.
Get more validations with medical professionals such as Doctors.