Bio Link Knowledge Graph Agent

Dataset Background

Synthea is an open-source synthetic patient dataset that models health records for fictional individuals, covering demographics, clinical history, and social factors. Generated using real-world healthcare patterns, it supports research, development, and testing of health IT systems while ensuring patient privacy.

Source: Synthea Size: 145,514 nodes, 311,701 edges

Why Synthea?

I chose Synthea because of its:

Rich text descriptions that enhance semantic retrieval capabilities.
Diverse entity types that mimic real-world healthcare settings, making it an ideal testbed for graph-based analysis.
Complex relational structures, demonstrating how NetworkX can uncover patterns that might be difficult for humans to discern.

Agent Providers

OpenAI: Provides the LLM backend for intelligent natural language understanding and reasoning.
ArangoDB: Stores the graph database, enabling scalable and efficient traversal of healthcare data.
Pinecone: Supports vector-based semantic search for retrieving similar medical records.
LangGraph: Implements the agent’s execution flow and decision-making logic.
LangSmith: Facilitates debugging and monitoring of the agent's performance.

Prompting Strategy

System Prompt: Defines the agent's core behavior, available tools, and constraints.
Graph Schema: Provides details about node and edge collections in ArangoDB.
Example-based Few-Shot Prompting: Uses a selector that retrieves relevant NetworkX examples based on semantic similarity.
Dynamic Prompt Construction: Integrates the system prompt, schema, and relevant examples to generate the final query-specific prompt.

Agent Tools

The agent employs multiple tools to query and analyze the dataset:

Vector Search Tool: Uses Pinecone to perform semantic similarity searches.
Graph Traversal Tool: Executes AQL queries in ArangoDB for structured graph traversal.
NetworkX Analysis Tool: Runs graph algorithms (e.g., centrality, shortest path) on a subgraph extracted from ArangoDB.
Graph Visualization Tool: Uses PyVis to create interactive visualizations of the subgraph structure.

Agent Design

The agent follows a ReAct (Reasoning + Acting) design using LangGraph:

User Query Processing: The agent receives a natural language query.
Tool Selection: The agent determines which tools to invoke based on the query context.
Execution & Iteration: The selected tool runs, and the agent refines the results if needed.
Final Response: The agent returns a structured answer, potentially with visualizations or data insights.

The workflow includes a stateful memory system, allowing conversations to persist and enabling iterative refinements over multiple turns. Additionally, constraints are in place to ensure scalability when working with large datasets, such as sampling nodes from subgraphs for NetworkX analyses.

What's Next?

Consider alternative agentic designs like hierarchical agents or planner-executor models for better modularity and reasoning.
Leverage graph states to store intermediate results, reducing redundant LLM parsing to tool calls.
Optimize NetworkX for large graphs by implementing parallel processing and providing more robust examples.

Built With

arangodb
langchain
langgraph
networkx
openai
pinecone
python

Updates

Jeffry Stevany Chandra started this project — Mar 08, 2025 09:08 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.