HomePulse AI

🧠 Inspiration

Our grandparents live alone. We're at hackathons. That tension is real — and we know we're not the only ones living it.

The numbers back it up: 1 in 4 adults over 65 falls each year. Reaction times slow with age. A smoke alarm that wakes a 25-year-old instantly can confuse an 83-year-old with mild hearing loss before they even process what's happening. A dangerous temperature spike that a younger person notices immediately can go undetected for hours.

Existing solutions are either too simple (dumb threshold alerts that cry wolf) or too invasive (cameras streaming 24/7 to a family group chat). Neither actually solves the problem — one drowns caregivers in false alarms until they tune it out, the other strips away the independence and dignity that matters so much to the people we're trying to protect.

We wanted something smarter. Something that understands context, learns what normal looks like for a specific person in a specific home, and only speaks up when it actually matters. That's HomePulse.

⚡ What It Does

HomePulse is a real-time AI-powered home safety system for older adults living independently. It doesn't just detect anomalies — it understands them.

Arduino sensors (temperature, sound, motion, door/window magnetic, pressure mats) feed a continuous stream of readings into a multi-agent reasoning pipeline. A loud noise at 3 PM might be a grandchild visiting. The same noise at 3 AM is a completely different story. HomePulse knows the difference.

Every suspicious event flows through:

Sensor Agent — scores readings against learned behavioral baselines
Triage Agent (Claude) — decides: is this worth investigating?
History Agent — pulls MongoDB context on what's normal for this person
Monitor Agent (Claude + multimodal vision) — reasons over a Cloudinary-hosted webcam frame: what is actually happening right now?
Escalation Agent — determines severity
Notification Agent (Claude) — writes a calm, human-readable caregiver alert and sends it via Gmail

The caregiver doesn't get SOUND_DB: 94, MOTION: TRUE. They get:

"An unusually loud noise was detected in the kitchen at 2:14 AM. The camera shows Margaret on the floor near the counter. This may require immediate attention."

🔨 How We Built It

Architecture

Arduino / Simulated Sensor Input │ ▼ sensor_agent ─────────────────────────────────┐ (baseline scoring) │ │ │ ▼ ▼ triage_agent vision_agent (Claude: investigate?) (OpenCV + Cloudinary) │ │ ▼ │ history_agent (MongoDB) ◄────────────────────┘ │ ▼ monitor_agent (Claude multimodal) │ ▼ escalation_agent → notification_agent (Claude + Gmail)Voice: voice_agent (ElevenLabs TTS) · voice_input_agent (Scribe STT) Dashboard: dashboard_agent (Fetch.ai Agentverse / ASI:One)

Layer	Technology
Hardware	Arduino (temp, sound, motion, magnetic, pressure sensors)
Computer Vision	OpenCV, Cloudinary
AI Reasoning	Anthropic Claude (triage, multimodal, alerts, digests)
Agent Orchestration	Fetch.ai uAgents + Agentverse Bureau
Backend	Python, FastAPI, Motor (async)
Database	MongoDB Atlas
Voice	ElevenLabs TTS/STT
Alerts	Gmail SMTP
Frontend	React, Using Cloudinary's SDK

🍃 MongoDB — The Memory of the System

MongoDB isn't just our database — it's the reason HomePulse gets smarter over time. We used features most teams never touch in a hackathon.

Behavioral baselines per user, per zone, per time-of-day. sensor_baselines stores per-hour weekday/weekend statistics for z-score anomaly detection. The same sensor reading means something completely different at 3 AM vs 3 PM, or in the kitchen vs the bedroom. That context lives in MongoDB and gets continuously updated by a dedicated learning_agent running on its own autonomous timer.

Atlas Vector Search for multivariate anomaly detection. sensor_readings stores 7-dimensional normalized embeddings. When a new event comes in, $vectorSearch finds nearest neighbors filtered by user_id and is_anomaly: false — giving us a learned similarity-based anomaly score, not just a threshold check. The same index powers similar incident lookup in incident_reports.

Atlas Search with graceful fallback. Full-text $search powers /search/events and /search/incidents when Atlas indexes are available. On M0 or local dev, we degrade to regex — the system keeps running honestly rather than pretending the limitation doesn't exist.

Twelve collections doing real work:

events — full anomaly lifecycle from triage insert through monitor reasoning, Cloudinary image URLs, escalation, notification, and learning outcomes
behavioral_schema — per-user false-positive rates and patterns; read by triage_agent before Claude is called
sensor_simulation_queue — bridges FastAPI and the uAgents bureau across separate processes; how hardware becomes optional
snapshot_cloudinary_gate — atomic find_one_and_update upload slot so Cloudinary uploads don't stack across retries
incident_reports — structured records with risk scores and embeddings, linked to events via session + transaction
Plus agent_heartbeats, room_zones, user_thresholds, camera_snapshots, alert_log and more

Motor async driver throughout. Every database call is non-blocking — critical when agents are firing in parallel and latency compounds across the pipeline.

☁️ Cloudinary — Eyes in the Reasoning Pipeline

Cloudinary is an active participant in our AI reasoning chain, not just a media host. Every alert frame goes through a full encoding pipeline we built from scratch:

Raw JPEG bytes uploaded with structured tags (homepulse, alert, evt_{event_id}, uid_{user}, type_{event_type}) and namespaced public_id paths (homepulse/raw/{event_id})
Zone crop applied — crop=crop with flags=relative for Claude's 0–1 fractional coordinates, or absolute pixel coordinates from MongoDB calibration data
Full transformation chain: e_sharpen:80 → e_improve → q_auto → f_auto → dpr_auto — or a named Cloudinary transformation via CLOUDINARY_NAMED_TRANSFORM_POSTCROP
Separate thumbnail chain — same crop, then c_limit at 480px for email tiles and dashboard cards
Both delivery URLs from one upload via cloudinary_url — no second stored object, just URL-encoded transformation chains off the same public_id
The transformed URL goes directly into Claude's multimodal context — Claude reasons over a sharpened, cropped, optimized frame of exactly the relevant zone
On false-positive confirmation, uploader.destroy with invalidate=True cleans the raw asset from the Media Library

The fl_relative flag is the linchpin: it maps Claude's 0–1 bounding box coordinates directly to Cloudinary crop parameters — the same coordinate space used for ElevenLabs spatial voice guidance. One coordinate system, end to end, from Claude's vision output to "to your left."

The React frontend includes a Cloudinary playground for live snapshot previews and event image browsing, demonstrating Cloudinary's media capabilities directly in the UI.

🎨 Figma Make — How We Used It We didn't start HomePulse by writing code. We started it in Figma Make. At 2AM on the first night, before a single API call was made, we were in Figma Make sketching what the caregiver dashboard should actually feel like. Not what was technically possible — what would make a worried family member trust the system at a glance. That distinction matters, and Figma Make let us have that conversation visually before we were locked into any implementation. Here's specifically how it fit our workflow: Wireframing before building. We mapped out the full frontend — sensor simulation panel, event feed, alert cards with Cloudinary image thumbnails, the Cloudinary playground, voice WebSocket interface — in Figma Make before touching React. That meant when we sat down to code, we weren't making design decisions at the same time as engineering decisions. Those were already resolved. A/B testing ideas we would have wasted hours building. We prototyped two completely different approaches to how alert severity should be communicated to caregivers — one color-coded dashboard-style, one minimal feed-style. Figma Make let us put both in front of each other, argue about them, and kill the worse one in twenty minutes instead of three hours of React we'd have to throw away. Pitching internally at 2AM. When one of us was deep in the agent pipeline and the other needed to explain a new UI idea, Figma Make was the shared language. Faster than words, faster than code, faster than a whiteboard photo texted across a table. The full process is in our slideshow — check it out to see how the design evolved from initial wireframes to the final frontend across the weekend.

We're not designers. Figma Make didn't require us to be. It just required us to think before we built — and that saved us more time than any other tool we used this weekend.

🤖 Fetch.ai — The Architecture, Not the Gimmick

13 agents, one Bureau, one command. python run_agents.py brings up the full multi-agent graph. We didn't use Fetch.ai as a wrapper — it's the load-bearing structure of the entire safety pipeline.

Fan-out and fan-in without an orchestrator. After Claude approves investigation, triage_agent fans out TriageResult to history_agent and vision_agent in parallel. Both do slow I/O independently. monitor_agent implements fan-in: two on_message handlers stash payloads in _pending[event_id], and _try_reason fires when both arrive — Claude gets called once with full context. No central orchestrator. The graph is the orchestration.

Typed message contracts. Every cross-agent payload subclasses uagents.Model. VisionResult carries cropped_thumb_url, spatial_crop_trusted, zone data. The sends are explicit. The graph is documented.

Autonomous timers. learning_agent, report_agent, and heartbeat_agent run on on_interval — acting on their own clock, not triggered by requests.

Agentverse when you need it. dashboard_agent runs bundled in the bureau locally, or with mailbox=True connected to Agentverse for caregiver queries via ASI:One. Same code, different topology, one config flag.

🗣️ ElevenLabs — Voice That Doesn't Panic People

For older adults, how an alert sounds matters as much as what it says. A robotic beep increases anxiety and confusion. We built a voice layer that's calm, clear, and genuinely useful.

Two models, two jobs. eleven_turbo_v2_5 for first alerts after a VoiceAlert fires — warmer delivery when a user is startled. eleven_flash_v2_5 for the real-time spatial correction loop in voice_agent — ~3s latency so directional guidance ("to your left") feels continuous, not lagging.

Scribe STT for voice input. voice_input_agent captures 16kHz mono WAV via sounddevice, sends to speech_to_text.convert with scribe_v1, matches wake phrases ("hey homepulse"), and routes the transcript to dashboard_agent via Fetch.ai — enabling natural language caregiver queries by voice with answers surfaced on screen.

Speaker bleed suppression. should_suppress_voice_capture and TTS_STT_TAIL_COOLDOWN_SEC prevent Scribe from transcribing the assistant's own voice echoing in the room during playback.

Non-blocking queue + daemon worker. Agents enqueue speech and return immediately — the uAgents event loop never stalls on MP3 generation or speaker drain. One vendor for both voice directions, one API key, consistent latency debugging.

🧱 Challenges We Ran Into

Multi-process coordination. FastAPI and the uAgents bureau are separate processes. Keeping the simulation queue, agent addresses, and message schemas in sync required careful plumbing — a single config source of truth in app/config.py was survival, not preference.

Calibrating Claude without alarm fatigue. Too sensitive and caregivers start ignoring alerts — arguably worse than no system. Getting triage judgment to feel trustworthy rather than paranoid took real prompt iteration at every stage.

Fan-in without race conditions. monitor_agent merges history and vision results that arrive independently. The uAgents loop is single-threaded so the in-memory _pending dict is safe without locks — but reasoning that out took time.

Hardware at hackathons is a gamble. We built a full simulation pipeline so the entire agent chain fires live from the frontend without an Arduino on the table.

🏆 Accomplishments We're Proud Of

We came in having never touched Fetch.ai, never used ElevenLabs, and barely knowing what Atlas Vector Search was. 36 hours later every single one of them is talking to each other in a working pipeline.

Genuinely end-to-end — physical Arduino sensor to Claude-written caregiver email, with vision, multi-agent reasoning, and behavioral learning at every step
Claude at four meaningful points — triage, multimodal monitor, alert copy, weekly digest. Not a chatbot wrapper
MongoDB doing production-grade work — Vector Search, Atlas Search, behavioral baselines, async Motor, atomic upload gates, cross-process simulation queue, and transactions. In a weekend
Cloudinary as a reasoning input — the fl_relative coordinate bridge from Claude vision output to multimodal context is something we're genuinely proud of
ElevenLabs end-to-end — turbo vs flash per utterance, Scribe STT, speaker bleed suppression, non-blocking worker. One vendor, both directions
Fetch.ai architecturally — real fan-out/fan-in, typed contracts, autonomous interval agents, Agentverse mailbox mode

The part that gets us most isn't any single feature. It's that all of it works together, built by two people in 36 hours on APIs we barely knew existed Friday morning.

📚 What We Learned

Multi-agent coordination complexity compounds fast — typed schemas and a single config source of truth are non-negotiable
Claude's value is in the prompt design, not the API call — triage and monitor reasoning took more iteration than any feature
Baseline learning beats threshold logic — per-person, per-zone, per-time baselines are what make safety systems actually reliable
MongoDB's document model is genuinely powerful for fast iteration — not blowing up a schema every hour kept us moving
Building something emotionally real is its own engineering challenge — easy to build a sensor dashboard, hard to build something you'd trust with your grandmother