Clinical Evaulation Of Our Agent
Complete Loss and Reward Convergence of Agent
Dynamic Stress Test of The Agent
Gemma 4 Reward Curve Over 150 Episodes
The Heatmap of Model Evaulation
Final Reward Scores of Qwen FineTuned Agent
Performance on Beta wave and Tremor Control

Parkinson's Motor Assist

Live MCP Server: https://app.promptopinion.ai/marketplace/mcp/019e087b-b63e-7da3-8d4a-5300cfb1cbf2

Demo Video: https://www.youtube.com/watch?v=ocF6SzPHexE

The Problem

Parkinson's disease affects over 10 million people worldwide. More than 50,000 of them have Deep Brain Stimulation (DBS) implants — electrodes surgically placed in the brain that send electrical pulses to reduce tremors and improve motor control.

The hardware works. The problem is the settings.

A neurologist programs the device manually, and those settings stay fixed until the next clinic visit — typically 3 to 6 months later. Meanwhile, the patient's brain state changes constantly: medication wears off, stress levels fluctuate, symptoms spike unpredictably.

The gap between clinic visits is where patients lose function.

What We Built

1. A Reinforcement Learning Environment

MotorAssistEnv is a training environment built on top of a peer-reviewed biophysical simulation (Fleming et al. 2023, Journal of Neural Engineering). The simulation models 600 neurons in the basal ganglia circuit using Hodgkin-Huxley dynamics — the same equations that describe how real neurons fire.

We created a 10-task curriculum:

Easy tasks: Stable patient, baseline symptoms
Medium tasks: Medication wears off mid-episode, agent must adapt
Hard tasks: Four simultaneous crises (medication failure + beta spike + tremor amplification + stimulation resistance)

The agent observes brain state every 20 milliseconds (beta oscillations, tremor magnitude, motor force output, side-effect accumulation) and adjusts stimulation settings (amplitude, pulse width, frequency).

2. A Trained Agent

We trained a Gemma 4 (4B parameter) language model using GRPO (Group Relative Policy Optimization) on a free Kaggle T4 GPU. Training took under 2 hours.

The model learns to:

Suppress pathological beta oscillations without over-stimulating
Reduce tremor while preserving voluntary movement strength
Stay within a safety budget for side effects
Respond to sudden symptom changes

3. An MCP Server on Prompt Opinion

The trained controller is deployed as an MCP server on the Prompt Opinion Marketplace. Any healthcare agent can call four tools:

Tool	Function
`get_dbs_recommendation`	Returns optimized DBS settings (amplitude/pulse width/frequency) from current patient state, with clinical interpretation and risk level
`run_dbs_simulation`	Runs a full therapy episode against the biophysical simulator and returns outcome metrics
`assess_motor_state`	Clinical risk stratification based on current symptoms, returns FHIR-compatible observation note
`get_model_performance`	Returns benchmark scores across easy/medium/hard task tiers

MCP endpoint: https://virustechhacks-motorassist-mcp.hf.space/sse

Why This Needs AI

Adaptive DBS control has been studied for years using classical control methods (PID controllers, Kalman filters, threshold-based rules). None of them generalize well.

The reason is structural: Parkinson's symptoms are non-stationary (change over time), partially observable (you can't measure everything), and multi-objective (suppress symptoms vs. avoid side effects vs. preserve motor function). Classical controllers work within their tuning range but fail when conditions drift.

A language model trained with reinforcement learning can:

Reason over heterogeneous signals — beta power, tremor envelope, force output, side-effect budget, medication phase — and produce a coherent action that balances all of them
Generate clinical explanations — every recommendation includes natural language interpretation a neurologist can audit
Compose with other agents — because it uses MCP, agents can chain assess_motor_state → get_dbs_recommendation → run_dbs_simulation to validate therapy plans

Results

We tested the trained model against larger models (7B and 72B parameters) that received no task-specific training — just the task description.

Task	Trained Gemma 4	Zero-shot 72B	Zero-shot 7B	Pass Threshold
Easy	0.830	0.810	0.740	0.55
Medium	0.610	0.615	0.255	0.52
Hard	0.480	0.390	0.019	0.42

The trained 4B model matches or beats the 72B model on medium and hard tasks. The zero-shot 7B model scores 0.019 on hard tasks — it fails immediately. The trained model scores 0.480.

What the hard task tests: Four simultaneous events (medication failure, beta spike, tremor amplification, stimulation resistance). The agent has to stabilize the patient without exhausting the side-effect budget. Zero-shot models either over-stimulate and max out side effects, or under-stimulate and let symptoms spiral.

FHIR + MCP Integration

The MCP server implements the SHARP Extension Specification for healthcare context:

FHIR Context Propagation

Advertises ai.promptopinion/fhir-context extension in MCP initialize response
Declares scopes: patient/Patient.rs, patient/Condition.rs, patient/Observation.rs
Prompt Opinion forwards FHIR context automatically when it detects this extension

Patient identity flows through every recommendation. The same MCP server can handle multiple patients without ambiguity.

Safety Constraints

Data Privacy

All observations come from the biophysical simulator or FHIR resources the calling EHR already has authorization for
No PHI stored on the MCP server

Hard Safety Bounds

All outputs clamped to clinical ranges (amplitude 0–5 mA, pulse width 0.06–0.20 ms, frequency 130–180 Hz)
Model cannot return unsafe values even if it attempts to

Regulatory Pathway

Medtronic's Percept PC and Abbott's Liberta DBS systems have FDA clearance for closed-loop stimulation
A policy trained in this environment maps to their published interfaces
This would be a 510(k) software upgrade, not a new device class

Error Handling

All tools wrap logic in try/except blocks
FHIR lookup failures fall back without crashing
HuggingFace Spaces sleeping returns clear "retry in 30s" message

How to Use It

Connect via Prompt Opinion

Go to https://app.promptopinion.ai/marketplace/mcp/019e087b-b63e-7da3-8d4a-5300cfb1cbf2
Click "Connect"
The four tools appear automatically in any healthcare agent workspace

Connect via Claude Desktop

Add to your MCP config:

{
  "mcpServers": {
    "parkinsons-dbs": {
      "url": "https://virustechhacks-motorassist-mcp.hf.space/sse"
    }
  }
}

Example Query

"Assess motor state for patient ID 12345 and recommend DBS settings"

If FHIR context is available, the tools will use it. If not, they operate on the provided parameters.

What We Learned

Reward Design is Adversarial

The agent optimizes multiple conflicting objectives: suppress beta oscillations vs. avoid over-stimulation, reduce tremor vs. preserve motor strength, stay within side-effect budget vs. respond aggressively to crises.

If any weight is wrong, the agent finds shortcuts. Early versions learned to set stimulation to 0.0 mA (zero side effects, patient collapses).We tested 15 adversarial strategies and blocked them explicitly. The only way to score well now is to actually treat the patient.

Supervised Learning First

Starting GRPO from scratch on a medical control task produces gibberish. The model has no idea what a valid DBS action looks like. Adding a supervised learning stage first (teaching valid output format before RL) fixed this completely. 100% format compliance across all training steps.

Scientific Validation Matters

The biophysics model is validated against real patient data. The agent isn't learning to optimize an arbitrary reward function, it's learning real physiological relationships. That's the difference between a hackathon demo and something that might eventually work.

What's Next

Short Term

Expand patient profiles (age, disease stage, comorbidities)
Add more stochastic events (stress, sleep deprivation, medication interactions)
Benchmark against classical adaptive controllers (PID, Kalman filters) to quantify where LLM policies outperform

Hardware Integration

Medtronic, Abbott, and Boston Scientific make DBS devices that read brain signals off the same electrode they stimulate with. A policy trained in this environment maps almost directly to their interface. The gap between this benchmark and real firmware is narrower than it's ever been.

Resources

MCP Server: https://app.promptopinion.ai/marketplace/mcp/019e087b-b63e-7da3-8d4a-5300cfb1cbf2
MCP Endpoint: https://virustechhacks-motorassist-mcp.hf.space/sse
Environment Server: https://virustechhacks-parkinsons-motor.hf.space
Trained Model: virustechhacks/dbs-grpo-qwen3-4b on HuggingFace
Demo Video: https://www.youtube.com/watch?v=ocF6SzPHexE

Built With

fleming-model
gemma4
huggingface
myosuite
openenv
qwen-coder
reinforcement-learning

Updates

Virus Hacks started this project — May 02, 2026 03:06 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.