GraphRAG for Drug Repurposing
Inspiration
The inspiration for this project was drawn from an urgent challenge in drug discovery: repurposing known drugs effectively. Conventional drug development is costly and time-consuming, whereas repurposing presents a low-cost option. Using GraphRAG (Graph-based Retrieval-Augmented Generation), I sought to improve AI-based insights for drug repurposing.
What I Learned
In the course of this project, I learned more about:
- How to apply TransE to knowledge graph embeddings and representation learning.
- Creating and managing graph structures in ArangoDB.
- Utilizing LangChain agents for coordinated complex retrieval and reasoning operations within AI workflows.
- Fine-tuning large language models for domain-specialized tasks utilizing prompt engineering and optimized retrieval capabilities.
How I Built It
Data Collection & Preprocessing
- Collected biomedical datasets, i.e., DRKG, DrugBank, and Hetionet, with information on drug interactions, diseases, and molecular associations.
- Pre-processed the data into a graphical structure format in order to encapsulate entities and their relationships.
Graph Construction
- Built a biomedical knowledge graph by ontologies and mappings of relations.
- Integrated pre-public domain knowledge graphs with filtered domain-specific databases.
Retrieval-Augmented Generation (RAG) Deployment
- Created an RAG pipeline that asks questions from the knowledge graph for supporting facts.
- Composited GraphRAG for further improvement of contextual insights for drug repurposing.
Challenges Encountered
- Visualization: Mapping biomedical relationships to meaningful, easy-to-use visualization presented unique challenges.
- Licensing: Highly usable biomedical data sources have restricting licenses that impair availability and joining.
Conclusion
This project showcased the strength of GraphRAG in augmenting AI-based drug repurposing. With the combination of knowledge graphs and retrieval-augmented generation, we can produce insights that enable researchers and clinicians. In the future, I plan to further optimize the retrieval pipeline and investigate novel approaches to augmenting biomedical AI applications.
Built With
- arangodb
- langchain
- transe
Log in or sign up for Devpost to join the conversation.