Sova | Devpost

Workflow Tech Stack

Inspiration

Sadhvi, one of our teammates, watched a family member go through multiple surgeries. What she remembers most isn't the operations themselves, it's the uncertainty afterward. Sitting at home, watching for symptoms, not knowing what was serious and what wasn't. Googling everything. Pasting symptoms into ChatGPT at 2am hoping for reassurance. Every subtle change in breathing or skin color felt like a potential emergency with no one qualified to ask.

What made it worse was that complications rarely stay contained. A cardiac patient develops an infection. A respiratory issue triggers a pain response. The doctors treating each system weren't automatically talking to each other, and a family member certainly wasn't equipped to bridge that gap.

That experience is not unique. One in five surgical patients is readmitted within 30 days of discharge. Most of those readmissions are preventable. The problem isn't that medicine lacks the expertise, it's that after discharge, there's no continuous, intelligent layer watching the patient and connecting the right people when something starts to go wrong.

We built Sova to be that layer.

What it does

Sova is a continuous post-surgical monitoring system that watches patients at home through a WHOOP wearable and responds intelligently when something looks wrong.

Vitals such as heart rate, HRV, SpO2, respiratory rate, and skin temperature, are ingested in real time and fed into our anomaly detection engine, which scores each reading on a 0–4 severity scale. A score of 0 means everything looks normal. A score of 4 means the system automatically initiates a 911 call without waiting for human confirmation.

For scores between 1 and 3, Sova convenes an AI medical caucus: a structured panel of specialist agents, a general practitioner, cardiologist, and other relevant specialists — each with their own clinical domain, reasoning framework, and access to the patient's full vitals history. They don't just agree with each other. They deliberate. When the cardiologist and the GP interpret the same signal differently, that disagreement is surfaced. The consensus recommendation, along with any dissenting view, is sent as a structured alert to the patient's physician — who can review, acknowledge, and act from a single screen.

Patients see their own recovery score in real time. Doctors see a timestamped audit trail of every agent recommendation. The full system runs across Android and iOS via a React and Kotlin app, with 11 Labs powering voice-based interaction for patients who prefer not to read alerts.

How we built it

The system is built across four interconnected layers, deployed on Google Cloud Platform.

The anomaly detection layer ingests WHOOP vitals and runs a multi-signal scoring model built with scikit-learn and PyTorch. Rather than flagging individual readings, the model evaluates convergence across vitals over time — a sustained pattern of abnormality across two or more signals is weighted significantly higher than an isolated spike.

The agent caucus layer is orchestrated with LangChain. Each specialist agent is driven by a distinct system prompt encoding their clinical specialty, known post-surgical risk patterns, and a structured output schema. We used a combination of leading LLMs for general reasoning and IFM's K2 Think V2 for the high-stakes inference tasks requiring deeper medical deliberation. A moderator agent synthesizes responses, flags conflicts, and formats the final recommendation for physician delivery.

The communication layer uses ElevenLabs to deliver voice alerts to patients and — in rank-4 scenarios — coordinates the 911 call flow automatically.

The frontend is built in React (web/doctor-facing) and Kotlin (Android), with a shared API layer. The entire pipeline — from vitals ingestion to physician alert — runs end to end in production.

Challenges we ran into

The hardest single problem was getting the full pipeline to work as a unified system rather than a collection of isolated demos. Each subcomponent — WHOOP ingestion, anomaly scoring, LangChain orchestration, K2 Think V2, ElevenLabs, and the mobile app — worked independently. Getting them to pass state to each other reliably, at the latency a real monitoring system requires, took most of our integration hours.

The agent layer required more careful prompt engineering than we anticipated. Early versions of the caucus would converge too quickly — agents would echo each other rather than reason from their distinct clinical perspectives. We had to invest significantly in how each agent's mandate was framed, and in how the moderator was instructed to handle and preserve genuine disagreements rather than smooth them over.

Calibrating the anomaly scoring thresholds without real clinical data was also a genuine challenge. We used publicly available post-surgical vital sign datasets and synthetic scenarios to tune the model, but we're clear-eyed that real-world validation would look different.

Accomplishments that we're proud of

We built a fully working end-to-end system in 36 hours. Vitals come in, anomalies get scored, agents deliberate, doctors get alerted, and in a rank-4 scenario, emergency services are contacted — automatically. Every piece of that pipeline works in production.

The agent caucus genuinely produces distinct, specialist-informed recommendations. Watching the cardiologist and GP reach different conclusions about the same vitals event — and seeing both perspectives surface to the physician rather than being silently resolved — is the thing we're most proud of technically. It's not just an LLM giving medical advice. It's a structured reasoning system that models how actual clinical consultation works.

We're also proud of the cross-platform app experience. A family caregiver, a patient, and a surgeon can all interact with the same underlying event through interfaces designed for their specific role and context.

What we learned

Multi-agent systems are only as good as the distinctions you build between them. The temptation when designing an agent caucus is to give every agent the same context and let the LLM differentiate them. It doesn't work. Each agent needs a tightly scoped mandate, constrained reasoning surface, and explicit instructions on when to defer and when to hold its position.

We also learned that the hardest part of building health tech isn't the AI — it's the integration surface. Getting a wearable API, a ML inference layer, an LLM orchestration framework, a voice synthesis service, and a mobile app to behave as one coherent system is a different kind of engineering problem than any of them are individually. Latency compounds. State management becomes load-bearing. Every handoff is a potential failure point.

And from Sadhvi's experience building something close to her own life: the best products aren't the ones that replace human judgment — they're the ones that make sure human judgment gets called at the right moment, with the right information, before it's too late.

What's next for Sova

The immediate next step is clinical validation. We want to run an observational study with a surgical center partner, tracking real post-surgical patients and measuring how Sova's anomaly scores and agent recommendations compare against clinical outcomes. Without that data, everything is directionally promising but not yet deployable.

On the product side, we want to expand beyond WHOOP to support additional wearables and eventually dedicated post-surgical monitoring patches that capture blood pressure and wound site data — the two biggest gaps in our current signal set.

The agent architecture is designed to be surgery-type aware. A cardiac surgery patient and a knee replacement patient have very different risk profiles. We want the caucus to be configurable by surgery type so the relevant specialists are weighted appropriately from the start.

Longer term, Sova is a platform for any high-risk continuous care scenario — chronic disease management, elderly monitoring, oncology patients in outpatient treatment. The discharge-to-checkup gap exists everywhere medicine sends people home and hopes for the best. We intend to close it.