Sodium
Sodium is a voice-forward care assistant robot that seniors can talk to like a person, while a dashboard gives family or caregivers a window into reminders, live conversation, browser tasks, and important context the system has collected.
Intro
Sodium is aimed at older adults who want independence without losing a safety net. The experience is built around natural speech: ask for help, get a spoken answer, and when the task needs the real web such as ordering food, looking something up, or filling a form, a browser agent can do the clicking and typing. Loved ones stay in the loop through a caregiver console instead of guessing what happened off-camera.
Inspiration
Many seniors are comfortable talking but not comfortable with small screens, passwords, and multi-step flows. At the same time, families often live far away and only hear about problems after something goes wrong. We wanted something that reduces daily friction (reminders, errands, lookups) and raises the right alerts when language suggests distress. Without replacing emergency services, but bridging to a trusted contact when that makes sense.
What it does
- Autonomous human follow— The robot autonomously follows humans using the YOLOX model in a PID controller.
- Voice loop — Speech in, agent reasoning, text-to-speech out (ElevenLabs), with optional wake-word support for a more “always there” feel.
- Smart agent — A Cerebras-backed model routes conversation and can invoke tools instead of only chatting.
- Browser automation — For tasks that need the real web, Browser Use runs a managed session; the dashboard can surface a live view of what the agent is doing.
- Reminders — Medication and routine prompts can be managed and reflected in what the agent knows.
- Transcript timeline — One place to see chat, tool use, and agent activity for transparency and debugging.
- Memory / requested info — Structured place to review intake-style information the assistant has asked for or stored.
- Crisis-aware escalation — When messages look like a distress or “call my family” situation, the system can trigger a Bland phone pathway to reach a saved contact (with cooldowns and guardrails in code and tests).
How we built it
| Layer | Choice |
|---|---|
| Dashboard | Svelte 5 + Vite — hash routes for Reminders, Transcript, Browser, Requested Info; light/dark theme. |
| API | Bun + TypeScript — HTTP API, SQLite for durable state, CORS for the Vite dev origin. |
| Agent & models | Cerebras (Llama-class chat) plus tool calling for browser jobs, TTS, and related flows. |
| Speech | AssemblyAI (STT), ElevenLabs (TTS); optional Google Generative Language APIs where configured. |
| Automation | Browser Use cloud API, optional profile / auth state for sites that need login. |
| Emergency bridge | Bland for outbound calls when crisis logic fires and a contact is on file. |
| Robot Controller | ROS 2 follower work for a physical-robot story. |
Challenges we ran into
- Many moving parts — Voice, LLM, browser sandbox, and telephony each have their own failure modes; getting clear errors and fallbacks took iteration.
- Crisis detection is sensitive — We need to catch real distress phrases without firing on harmless uses of words like “call” (movies, casual chat); tests and tuning around
detectCrisishelped. - Authenticated browsing — Real sites expect cookies and flows; profiles and auth state are powerful but easy to misconfigure under time pressure.
- Latency and perceived “aliveness” — Streaming, chunk delays, and TTS round-trips must feel responsive; small timing tweaks matter more than they look on paper.
- Demo vs. production — Hackathon scope means some paths are best-effort until keys, pathways, and policies are fully production-hardened.
Accomplishments that we're proud of
- A cohesive caregiver dashboard that matches how families actually want to monitor: reminders, live transcript, and browser session visibility.
- An agent that is more than a chatbot—it can delegate to real browser automation and spoken output.
- Crisis escalation wired to actual phone outreach (Bland) with automated tests that document expected behavior.
- A repo structure that separates frontend, backend, and optional bot/robot pieces so the story can grow from software demo toward physical companion.
What we learned
- Transparency beats blind trust for caregiver tools: showing tool calls and browser actions is as important as showing chat.
- Guardrails are product features, not afterthoughts—especially for anything involving outbound calls or distress inference.
- Bun is a strong fit for a hackathon API: fast iteration, simple
Bun.serve, and straightforward TypeScript. - Integrating best-in-class APIs (speech, browser, phone) is faster than building each piece—but orchestration and state still dominate engineering time.
What's next for SODIUM
- Deeper real-time voice — Smoother turn-taking, barge-in, and lower end-to-end latency.
- Richer caregiver tools — Permissions, audit log export, and clearer human-in-the-loop approvals before risky browser actions.
- Hardware path — Tighter coupling with wake word + robot / ROS demos for tracks that care about multi-device or embodied AI.
- Safety and compliance — Explicit disclosure flows, stronger crisis disclaimers, regional rules for automated calling, and clinical boundaries (assistant, not a medical device).
- Reliability — Queueing for browser jobs, retries, and observability so a bad third-party hour doesn’t erase user trust.
Built With
- css
- langchain
- python
- svelte
- typescript
Log in or sign up for Devpost to join the conversation.