GraphRAG for Drug Repurposing

Inspiration

The inspiration for this project was drawn from an urgent challenge in drug discovery: repurposing known drugs effectively. Conventional drug development is costly and time-consuming, whereas repurposing presents a low-cost option. Using GraphRAG (Graph-based Retrieval-Augmented Generation), I sought to improve AI-based insights for drug repurposing.

What I Learned

In the course of this project, I learned more about:

  • How to apply TransE to knowledge graph embeddings and representation learning.
  • Creating and managing graph structures in ArangoDB.
  • Utilizing LangChain agents for coordinated complex retrieval and reasoning operations within AI workflows.
  • Fine-tuning large language models for domain-specialized tasks utilizing prompt engineering and optimized retrieval capabilities.

How I Built It

Data Collection & Preprocessing

  • Collected biomedical datasets, i.e., DRKG, DrugBank, and Hetionet, with information on drug interactions, diseases, and molecular associations.
  • Pre-processed the data into a graphical structure format in order to encapsulate entities and their relationships.

Graph Construction

  • Built a biomedical knowledge graph by ontologies and mappings of relations.
  • Integrated pre-public domain knowledge graphs with filtered domain-specific databases.

Retrieval-Augmented Generation (RAG) Deployment

  • Created an RAG pipeline that asks questions from the knowledge graph for supporting facts.
  • Composited GraphRAG for further improvement of contextual insights for drug repurposing.

Challenges Encountered

  • Visualization: Mapping biomedical relationships to meaningful, easy-to-use visualization presented unique challenges.
  • Licensing: Highly usable biomedical data sources have restricting licenses that impair availability and joining.

Conclusion

This project showcased the strength of GraphRAG in augmenting AI-based drug repurposing. With the combination of knowledge graphs and retrieval-augmented generation, we can produce insights that enable researchers and clinicians. In the future, I plan to further optimize the retrieval pipeline and investigate novel approaches to augmenting biomedical AI applications.

Built With

  • arangodb
  • langchain
  • transe
Share this project:

Updates