PulseGuard: An Offline AI Triage Assistant for Emergencies

What Inspired Me

My inspiration for PulseGuard stemmed from a personal experience during a hiking trip in a remote area where a friend suffered a sudden allergic reaction. With no cell signal, we were left relying on basic first-aid knowledge, and it took hours to get professional help. This highlighted a critical gap in emergency medical tools: most AI assistants depend on the cloud, which fails in offline scenarios like rural areas, disaster zones, or even subways. I was also influenced by growing concerns over data privacy in healthcare—regulations like HIPAA make cloud-based data sharing risky or illegal. I envisioned an app that empowers first responders and individuals with instant, private medical guidance, right on their device. The RunAnywhere SDK seemed perfect for enabling on-device AI, pushing me to build something that could truly save lives without compromising privacy.

What I Learned

Throughout this project, I deepened my understanding of on-device AI deployment. I learned how quantization techniques, like INT4 for models such as DeepSeek-R1-Distill, can reduce memory footprint to ~2–3GB, making inference feasible on mid-range mobile devices. I also grasped the nuances of speech-to-text (STT) and text-to-speech (TTS) in offline environments using Whisper and local TTS engines. On the medical side, I explored triage protocols and how to encode them into prompts with rule-based logic to ensure safe, step-by-step reasoning. Privacy laws reinforced the need for edge computing, and I discovered tools like local vector caches for storing medical guidelines without cloud access. Mathematically, I experimented with context window management to optimize speed, balancing token limits with accuracy—essentially solving for minimal latency via ( t = \frac{c \cdot n}{p} ), where ( c ) is computation per token, ( n ) is context size, and ( p ) is processing power.

How I Built My Project

I started with the core architecture: a high-level flow from user input to response, all running offline. Using the RunAnywhere SDK for orchestration, I integrated a quantized Whisper model for STT to convert voice symptoms into text. The brain is DeepSeek-R1-Distill (4–7B parameters, INT4 quantized), chosen over LLaMA for its superior medical reasoning and smaller variants. I built a medical triage engine by crafting prompts that incorporate rule-based decision trees, like urgency scoring via simple algorithms (e.g., if symptoms include chest pain and shortness of breath, escalate to high priority). A local vector cache stores offline medical guidelines for quick retrieval. For output, a TTS engine provides spoken responses. I tested on Android devices, capping the context window to ensure sub-second inference. The app handles voice/text input, reasons locally (e.g., "Likely anaphylaxis; administer epinephrine if available"), and responds instantly—no data leaves the phone.

Challenges I Faced

One major challenge was optimizing model size and speed for mobile hardware; initial runs on DeepSeek exceeded RAM limits, so I implemented partial layer loading via RunAnywhere, which required fine-tuning quantization without losing diagnostic accuracy. Offline STT with Whisper was noisy in real-world tests—ambient sounds interfered, leading me to add audio preprocessing filters. Ensuring medical reliability was tricky; I had to validate prompts against standard triage protocols to avoid hallucinations, iterating through dozens of test scenarios. Privacy constraints meant no cloud fine-tuning, so I relied on distilled models, which sometimes lacked nuance in rare cases. Finally, testing in simulated offline environments revealed latency spikes under low battery, forcing power-aware optimizations. Despite these hurdles, overcoming them made PulseGuard a robust, life-saving tool.

Built With

  • android
  • deepseek-r1-distil
  • int4
  • kotlin
  • runanywhere
Share this project:

Updates