Evidence-Aware Longitudinal Clinical AI

Architecture
Prompt Opinion Agent Screenshot
MCP Tool Integration

Inspiration

Large language models are increasingly being used in healthcare applications, but they often struggle with longitudinal reasoning, especially when interpreting lab values across multiple timepoints. During experimentation with clinical-style summaries, we observed that language models could incorrectly infer trends, overstate conclusions, or hallucinate clinical interpretations when analyzing sequential patient reports.

This project was inspired by the idea that clinical AI systems should not rely entirely on probabilistic language generation for temporal reasoning. Instead, deterministic trend analysis and structured reasoning should guide the explanation layer.

We chose thyroid lab analysis as a focused use case because thyroid markers such as TSH, T3, and T4 naturally evolve over time and require careful interpretation of trends rather than isolated values.

What it does

The system analyzes longitudinal thyroid lab reports and generates evidence-aware clinical summaries using a hybrid reasoning approach. The application:

extracts and organizes thyroid lab values chronologically
detects improving, worsening, stable, and fluctuating patterns
identifies meaningful longitudinal changes
assigns risk/concern levels
generates cautious, structured clinical explanations
reduces hallucinated reasoning by separating deterministic analysis from language generation The system is integrated into Prompt Opinion through MCP (Model Context Protocol).

How we built it

Patient Reports ↓ Deterministic Pattern Engine ↓ Evidence Layer ↓ Controlled LLM Explanation ↓ Structured Clinical Output

Core Components

Pattern Detection Engine We implemented deterministic logic in Python to analyze:
directional trends
threshold crossings
fluctuating behavior
improving/worsening tendencies This prevents the language model from independently inferring numerical trends.
Evidence Layer An evidence-aware layer was added to provide cautious interpretation guidance and reduce unsupported conclusions.
Controlled Prompting The LLM was constrained through structured prompting rules: -avoid hallucinated diagnoses
avoid speculative causes
avoid treatment recommendations
focus only on observable data patterns
explicitly acknowledge uncertainty
MCP Integration The system was deployed on Render and exposed as an MCP server integrated with Prompt Opinion.

Challenges we ran into

One of the biggest challenges was realizing that LLMs alone were unreliable for longitudinal numerical reasoning. Early versions of the system produced inconsistent interpretations, especially for fluctuating lab patterns.

Another challenge was distinguishing between:

true directional progression
fluctuating instability
isolated abnormal values We also encountered integration and deployment challenges while exposing the analysis engine through MCP and testing model behavior within Prompt Opinion.

Additionally, we had to carefully control prompt behavior to reduce:

overconfident wording
hallucinated symptoms
unsupported diagnoses
speculative clinical recommendations

Accomplishments that we're proud of

Built a working MCP-integrated clinical AI tool deployed on Render and connected to Prompt Opinion.
Designed a hybrid reasoning system that separates deterministic numerical analysis from language-model explanation.
Successfully reduced hallucinated trend interpretation by preventing the LLM from independently inferring longitudinal patterns.
Implemented structured detection for improving, worsening, stable, and fluctuating thyroid trends across multiple timepoints.
Added an evidence-aware interpretation layer focused on cautious clinical language and uncertainty handling.
Created a system that avoids unsupported diagnoses, speculative causes, and treatment recommendations.
Successfully tested the system on multiple synthetic patient scenarios with distinct longitudinal patterns. Learned how to integrate MCP tooling, deployment workflows, and controlled prompt engineering into a healthcare-oriented AI application.

What we learned

This project reinforced that reliable clinical AI systems require hybrid architectures rather than relying entirely on generative models. We learned:

deterministic reasoning improves reliability
longitudinal analysis is significantly harder than single-report summarization
prompt constraints are critical in healthcare-oriented AI systems
separating reasoning from explanation reduces hallucination risk We also gained hands-on experience integrating MCP-based tooling with Prompt Opinion.

What's next for Evidence-Aware Thyroid Trend Analyzer

Future directions include:

support for additional laboratory markers
broader longitudinal patient analysis
retrieval-backed evidence grounding
structured confidence scoring
integration with richer clinical datasets The long-term goal is to explore safer approaches for clinical AI systems that combine deterministic reasoning with language-model explainability.

Built With

3.3
70b
api
context
fastapi
fastmcp
groq
json
llama
mcp
opinion
prompt
protocol)
python
render
uvicorn

Updates

Pallavi Chandanshive started this project — May 08, 2026 03:41 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.