Clinical Safety Auditor: A Neuro-Symbolic Agentic Framework

A neuro-symbolic AI safety net: Multi-agent nodes safeguarding the neural diagnostic process to prevent misdiagnosis.
5-layer neuro-symbolic framework for AD diagnosis: bridging MRI perception and agentic reasoning via the standardized MCP layer.
Orchestrator workflow: Evaluates Uncertainty (U) and Confidence (C) to route ambiguous cases for counterfactual verification.
SHAP vis. for an example subject. The model wrongly predicted AD, but the reasoning agent detected no atrophy, preventing false positives.

What inspired us
Deep learning models show great promise in medical image analysis, but their deployment is often limited by their "black-box" nature and lack of transparency. In data-scarce environments, traditional models often become overconfident and overfit when classifying ambiguous cases, such as distinguishing mild cognitive impairment (MCI). While standard Explainable AI (XAI) provides valuable visualizations, it is fundamentally passive and cannot actively validate whether these features align with clinical reasoning. We recognized the urgent need for a paradigm shift from passive visualization to active auditing, mirroring a real-world clinical diagnostic workflow to ensure patient safety.

How we built it
We designed a five-layer neuro-symbolic architecture that explicitly decouples statistical intuition (System 1) from logical reasoning (System 2).

Perception Layer: We built a hybrid Convolutional Neural Network and Random Forest (CNN-RF) trained from scratch on clinical data to extract domain-specific neurodegenerative features. This layer outputs a binary prediction, confidence C, and an uncertainty quantification score U calculated via ensemble variance.
Reasoning Layer: Operating as a "Clinical Safety Auditor," this layer uses a dual-LLM Agent-to-Agent (A2A) pattern. Agent A (Phi-4-mini) acts as the Orchestrator, dynamically routing cases and triggering interventions if ambiguity is high, specifically when U>0.6 or C<0.7. Agent B (Llama-3.1-Aloe) serves as the Safety Auditor, validating the statistical outputs against anatomical constraints.
Integration: We utilized the Model Context Protocol (MCP) to standardize the interface between the reasoning agents and diagnostic tools, including a Neo4j Knowledge Graph and GraphRAG.
Application: A Streamlit-based Clinician Dashboard presents the final, verified "Clinical Safety Audit Report" to healthcare professionals.

Challenges we facedOne major challenge in medical agent systems is "tool hallucination". We mitigated this by adopting the MCP as a strict interoperability layer, ensuring that the agent's decisions to verify a diagnosis are translated into deterministic API calls, such as simulating counterfactuals. Additionally, balancing deep reasoning with practical clinical workflows required extensive engineering. By offloading preprocessing and focusing on an efficient dual-agent setup, we achieved an online diagnostic latency of just 33.62 seconds per subject running entirely locally on an NVIDIA GeForce RTX 5090 Laptop GPU.

What we learnedBy treating MCI as an out-of-distribution (OOD) instance, our strategic binary-baseline approach effectively stress-tested the reasoning layer's ability to identify ambiguity. The system successfully flagged 89.4% of ambiguous MCI cases for agentic review. We learned that an agentic reasoning layer can successfully enforce rigorous logic-driven constraints to intercept high-confidence errors and significantly improve diagnostic safety in data-scarce environments.

Built With

a2a
antspy
cnn
dipy
graphrag
llm
mcp
neo4j
python
random-forest
shap
streamlit

Updates

Morris Jhen-Nong Chen started this project — Oct 22, 2025 05:25 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.