Aura PD-voice

Homepage of the application
The GNN based user recommendation
Timeline of State
User's profile
Checking State

Inspiration

Parkinson's disease affects over 10 million people worldwide, yet motor state monitoring still largely happens in
15-minute during patience's clinical check. The neurologist asks "how bad was the tremor this week?" and the patient shrugs — there's no data. There's one plausible solution: Consumer wearables. But they're expensive and quite inaccessible. However, everyone already has a smartphone with a high-frequency accelerometer in their pocket. So the question became: what if the sensor you already carry could track your motor state, explain its reasoning out loud, and adapt to you over time — entirely on-device?

What It Does

AuraPD Voice is a fully on-device iOS app that:

Listens for the wake phrase "Check my condition" (or a tap)
Captures 10 seconds of accelerometer data via CoreMotion at 50 Hz
Classifies motor state as ON / OFF / Tremor using an adaptive agent
Explains the result in plain English and speaks it aloud via TTS
Learns from thumbs-up / thumbs-down feedback in real time
Matches the user to a de-identified patient library using GNN-style similarity

Everything runs on-device. No raw sensor data ever leaves the phone.

How I Built It

Signal Processing

Three features are extracted from each 10-second accelerometer window:

$$\mu = \frac{1}{N}\sum x_i, \quad \sigma = \sqrt{\frac{1}{N}\sum (x_i - \mu)^2}, \quad E = \sum x_i^2$$

The agent forms a weighted $\sigma_w$:

$$\sigma_w = w_0 \sigma + w_1 \frac{E}{E_{max}} \sigma \cdot 0.15 + w_2 |\mu| \cdot 0.05$$

Adaptive Threshold

Rather than a fixed cutoff, the agent blends the user's manual slider with a learned base:

$$\tau_{adaptive} = 0.65\tau_{user} + 0.35\tau_{base}$$

Classification:

$$\text{state}(\sigma_w) = {\sigma_w > 2\tau \ \text{OFF}, \tau < \sigma_w \leq
2\tau \ \text{ON}}$$

Confidence is the distance to the nearest decision boundary pushed through a sigmoid:

$$\text{conf} = \frac{1}{1 + e^{-\lambda \cdot \min(|\sigma_w - \tau|,|\sigma_w - 2\tau|)}}$$
where $\lambda$ is a hyperparameter adjusted manually.

Online Learning from Feedback

Each thumbs-up / thumbs-down triggers a three-step online update ($\delta = +1$ confirm, $-1$ reject):

Step 1 — threshold gradient descent:

$$\tau_{base} \leftarrow \tau_{base} + \alpha \delta (\sigma_w - \tau_{base})$$

Step 2 — feature weight update (perceptron rule, then re-normalise):

$$w \leftarrow w + \alpha \delta \frac{\phi}{|\phi|}$$

Step 3 — accuracy EMA ($\beta = 0.6$):

$$\text{acc} \leftarrow \text{acc}(1 - \alpha\beta) + y(\alpha\beta)$$

GNN Patient Matching

Each patient is a node in a feature-vector graph. Three similarity dimensions are computed:

$$S_{cos} = \frac{A \cdot B}{|A||B|}, \quad S_{rbf} = \exp\left(-\frac{|A-B|^2}{2\sigma^2}\right), \quad S_{eucl} = \exp\left(-\sqrt{\sum w_i (a_i - b_i)^2}\right)$$

These fuse into a Treatment Inspiration Value:

$$\text{TIV} = 0.30S_{cos} + 0.45S_{rbf} + 0.25S_{eucl}$$

The 0.45 weight on treatment response reflects the clinical insight that how a patient responds to levodopa is the most actionable reference signal.

Stack

Swift + SwiftUI, CoreMotion, AVFoundation (TTS), Speech framework — all iOS-native. No backend, no model files,
just math running live on the A-series chip.

Challenges

Microphone ownership conflicts. The Speech recogniser for wake-word detection and AVFoundation for TTS both fight over the audio session. I had to build a careful state machine that pauses the wake-word listener before TTS fires, then restarts it after — with a guard against re-triggering mid-speech.

Variance compression in live data. At rest, $\sigma$ sits around 0.01–0.03 g. A moderate tremor is 0.08–0.15 g. The naive fixed threshold either misses mild tremors or fires constantly during walking. The blended $\tau_{adaptive}$ and per-session calibration fixed this, but tuning $\beta$ so accuracy converged within ~10 feedback taps — not
100 — took a lot of trial and error.

Building something clinically humble. Every string in the UI reminds the user this is a reference tool, not a
diagnosis. Writing hypothesis text that is informative but explicitly non-prescriptive — especially for medication-timing inferences — was harder than the signal processing.

What I Learned

I came in knowing SwiftUI. I left understanding why on-device ML matters beyond the privacy pitch: latency. A cloud round-trip for real-time tremor visualisation would be unusable. Doing everything locally means the avatar
responds to the accelerometer within one frame.

Built With

claude
swift
xcode

Updates

Xudong Li started this project — Apr 17, 2026 11:26 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.