SkipAlerts AI
Inspiration
Most smart home systems alert us when something happens — motion detected, a door opened, a package arrived. But in our everyday life, many of my biggest problems come from the opposite: when something didn’t happen.
I forget to put the trash bins out on time. I don’t realize the mailbox hasn’t been checked for over a week. Sometimes I close the garage door, walk inside, and later discover it never actually closed because something blocked it — even though I assumed it was shut.
As parents, we rely on apps that notify us after our kids arrive home, but there’s no system that reminds us when they haven’t arrived — even hours later — when we’re busy and distracted.
As caregivers, we assume everything is fine because routines usually are. But what if an elderly parent who lives alone doesn’t leave the bedroom by 9 a.m.? What if they go into the bathroom and don’t come out for over an hour? These are not events — they are missing routines — and today there is no calm, reliable way to notice us without constant checking.
We already own security cameras, sensors, and smart devices. We already have calendars and reminders. Yet we still miss these moments because all of these systems require us to:
- set countless alarms
- open multiple apps
- constantly check live feeds
- track people’s locations
- or accept invasive surveillance
What’s missing isn’t another device or another notification — it’s a brain.
I imagined a system that quietly watches for routine failure, not activity. A system that stays silent when life is normal, and speaks only when something important didn’t happen without recording, tracking identities, or invading privacy.
When I learned about Gemini 3 Flash’s native spatio-temporal grounding, I realized I could finally build a system that understands where things are in relation to time, not just what they are. Pairing this with Gemini 3 Pro’s complex deductive reasoning provided the perfect framework for an autonomous brain that doesn't just detect motion, but intelligently infers routine failures and contextual anomalies. This specific combination of high-speed spatial vision and deep reasoning convinced me that now is the definitive moment to build a truly proactive, state-aware home guardian.
That idea became SkipAlerts AI: While SkipAlerts is the guardian you interact with, I built the autonomous reasoning layer as the MissedIt Engine — a stateful, long-running agent that synthesizes vision data, sensor readings, and household schedules to detect absence, understand context, and gently alert us when routine health breaks—so we don’t have to remember everything in our already busy lives.
Note: “SkipAlerts is not a gadget or a notification system — it is a reasoning primitive that treats non-events as first-class signals.”
End-to-End Flow (High-Level)
- Camera frames and sensor events are ingested as structured input signals.
- Gemini 3 Flash (
gemini-3-flash-preview) runs as the perception layer to extract scene state: objects, presence/absence, location, and condition. - A visual-change gate checks whether the state meaningfully changed; if not, reasoning is skipped to reduce unnecessary calls.
- If changed (or time-based check required), Gemini 3 Pro (
gemini-3-pro-preview) runs as the reasoning layer with routines, household profile, context (time/day), and memory. - Gemini 3 Pro can call tools (schedule and weather) when additional context is needed.
- The engine returns a strict JSON decision:
ALERT,WAIT, orNO_ACTION, plus urgency and routine updates. - Deduplication/cooldown suppresses repeated alerts for the same unresolved anomaly.
- Notifications are dispatched only when rules are violated; otherwise the app stays calm and reports normal routine state.
Results
- SkipAlerts AI shifts smart-home intelligence from event detection to absence-of-routine detection.
- Instead of notifying only when something happens, it stays calm when routines are healthy and alerts only when something important did not happen (for example: bins not out, kids not arrived, garage not secured, expected elderly routine not observed).
- The MissedIt Engine( the Brain I call it) combines Gemini 3 Flash for spatial state understanding and Gemini 3 Pro for temporal/contextual reasoning to produce traceable alert decisions from vision, sensor, and schedule context.
Accomplishments We’re Proud Of
- Built a working “missing routine brain” instead of another motion-alert app.
- Designed a calm-first behavior model: normal homes see routine-normal state, not constant noise.
- Implemented a stateful reasoning layer (MissedIt Engine) that synthesizes camera/sensor/context into actionable alerts.
- Applied privacy-conscious design goals by focusing on routine state and safety context rather than identity tracking.
- Delivered end-to-end agentic flow from perception to reasoning to alert delivery for real household routine failures.
How I Built It
- Built a multi-agent pipeline with clear role separation:
- Perception agent (vision), sensor interpretation agent, and orchestration/reasoning agent.
- Added stateful memory and execution state to support long-running routine monitoring.
- Implemented context caching to reduce token cost and latency for repeated reasoning calls.
- Enforced structured outputs (
responseMimeType: application/json) for predictable UI and alert handling. - Added policy enforcement post-processing so hard safety rules cannot be downgraded.
- Added duplicate-alert suppression and cooldowns to avoid alert fatigue.
Challenges We Ran Into
- Distinguishing true state change vs minor frame variance caused repeated reasoning triggers.
- Getting deterministic urgency behavior from LLM output required explicit policy signals plus post-LLM enforcement.
- Modeling “absence over time” is harder than object detection; temporal interpretation needed stronger context shaping.
- No physical sensor hardware was available during development.
- Sensor pathways were still validated successfully through controlled simulated sensor inputs.
Simulation Testing Challenges
- Compressing real-world durations (hours/days) into short test runs can distort timing behavior if simulated duration hints are not configured consistently.
- Scene realism varies across clips; small visual changes between frames can trigger extra reasoning cycles that do not reflect true routine changes.
- Some routines depend on context not visible in a single frame (for example, mailbox checks over a week), so simulation must rely on inferred interaction signals.
- Running mixed-category tests in one profile can cause cross-triggered alerts from unrelated active routines, reducing test isolation.
- Simulated sensors do not perfectly mirror hardware noise and event cadence, so edge-case timing and correlation behavior can differ from production.
What We Learned
- Gemini 3 Flash’s spatial grounding is highly effective for converting camera input into structured scene state (what is present/absent, where, and in what condition).
- Vision understanding alone is not enough for routine health; Gemini 3 Pro is needed for temporal and contextual reasoning across schedules, safety rules, and household state.
- Long-running “marathon agent” behavior needs durable memory and execution state; without it, routines degrade into disconnected single-frame decisions.
- Prompting is important, but reliable outcomes require explicit policy guardrails and post-decision enforcement for critical urgency rules.
- In persistent monitoring, state-change gating and duplicate-alert cooldowns are mandatory to avoid repeated alerts from minor visual jitter.
- Tool-grounded context (school schedule and weather) significantly improves correctness when visual evidence is ambiguous.
Why Gemini 3? The "MissedIt" Engine Architecture
SkipAlert AI wasn't possible with previous generation models. Detecting a "non-event" (something not happening, like a kid not getting off a bus, or a bin not being moved) requires more than just object detection—it requires stateful reasoning and spatio-temporal awareness.
Here are the specific Gemini 3 features leveraged in this codebase to build the brain:
1. Native Multimodality & Spatio-Temporal Grounding (Gemini 3 Flash)
- The Challenge: Traditional computer vision (YOLO/ResNet) can identify a "trash bin." It cannot identify "a trash bin sitting at the curb vs. sitting in the garage."
- The Gemini Feature: We utilized Gemini 3 Flash’s native multimodal capabilities in the Vision Agent. instead of bounding boxes, we feed full image frames and prompt for spatial relationships (Zone Logic).
- In Code: The runVisionAgent function relies on Flash to distinguish between "Presence" (is the object there?) and "Location State" (is it in the Driveway or Porch?). This spatial understanding is the "Spatio" half of the engine.
2. Complex Deductive Reasoning (Gemini 3 Pro)
- The Challenge: A missing trash bin isn't always an alert. It might be a holiday, or maybe the user is on vacation. A simple if/else script fails here.
- The Gemini Feature: We leveraged Gemini 3 Pro’s advanced reasoning capabilities to act as the MissedIt Engine (Orchestrator). It doesn't just process inputs; it synthesizes conflicting data points:
- Input A (Vision): "Bin is not at curb."
- Input B (Time): "It is Friday 8:00 AM."
- Input C (Profile): "User is home."
- Reasoning: "Normally this is an alert, BUT let me check external factors first."
- In Code: This is implemented in the runMissedItEngine loop, where Pro acts as the final decision maker.
3. Multi-Turn Function Calling (Tool Use)
- The Challenge: AI models hallucinate or assume routine regularity. The system needed a way to verify real-world facts before sending a panic alert.
- The Gemini Feature: We implemented Native Function Calling (Tools) within the reasoning loop.
- In Code:
- check_schedule_exception: The model autonomously decides to call this tool when it detects a missed routine. If the trash isn't out, it asks: "Is today a holiday?"
- check_weather: If a dog is left outside, the model calls this to check the temperature before deciding if it is "animal cruelty" or just "playtime."
- Result: This drastically reduces false positives, solving the "Boy Who Cried Wolf" problem in home monitoring.
4. Search Grounding (googleSearch)
- The Challenge: The model needs up-to-date context that isn't hardcoded (e.g., "Is there a snowstorm causing school delays today?").
- The Gemini Feature: We enabled the Google Search Tool in the orchestrator config.
- In Code: In missedItEngine.ts, the config includes tools: [{ googleSearch: {} }]. If the specialized tools fail, the model falls back to live Google Search to understand context (e.g., "Why is mail not delivered today?"), providing users with verifiable links in the dashboard. As of now it is simulated with mock data to reduce testing challenges.
5. Strict JSON Schema Enforcement
- The Challenge: To build a reliable UI (React Dashboard), the AI's "thoughts" must be converted into rigorous data structures (timestamps, urgency levels, enums).
- The Gemini Feature: We utilized responseSchema with responseMimeType: "application/json".
- In Code: Both the Vision Agent and the Orchestrator use strict Type definitions (Type.OBJECT, Type.ARRAY) to ensure that even complex reasoning outputs plug directly into the TypeScript frontend without parsing errors.
6. Cost-Effective "Marathon" Architecture (Flash vs. Pro)
- The Challenge: Continuous reasoning is expensive.
- The Architecture: We used Gemini's tiered model structure to build a "Marathon Agent."
- Gemini 3 Flash runs frequently (Vision/Sensor polling) because it is fast and low-cost.
- Gemini 3 Pro is only woken up when the Gatekeeper detects a significant state change, allowing the system to run 24/7 without burning budget.
What’s Next for SkipAlerts AI
- Multi-camera and multi-zone event stream architecture for whole-home coverage.
- Richer sensor integrations (door, power, occupancy, thermostat, leak/smoke).
- Personalized routine learning with user-adjustable escalation policies and add more use cases.
- Better explainability (“why this alert now”) with compact evidence traces.
- Production hardening: observability, policy audit logs, and robust fallback modes.
- Hybrid Intelligence & Cost Efficiency Our architecture leverages a Multi-Model Pipeline. We utilize Gemini Nano as a local 'Spatio-Temporal Gatekeeper' to handle 24/7 state monitoring on-device. By performing local inference on the NPU, we ensure 100% user privacy for routine data. We only escalate to Gemini 3 Flash/Pro when Nano detects a high-entropy event, reducing our cloud inference costs by over 90% while maintaining 'PhD-level' reasoning for critical alerts.
Built With
- Frontend: React + TypeScript + Vite + utility-class UI
- AI Models: Gemini 3 Flash (
gemini-3-flash-preview), Gemini 3 Pro (gemini-3-pro-preview) - AI Features: Multimodal perception, tool calling, context caching, structured JSON responses
- App Components: Vision agent, sensor agent, reasoning/orchestration engine, notification pipeline
Built With
- agentic-ai
- computer-vision
- gemini-3-flash
- gemini-3-pro
- google-gemini-api
- google-search-grounding
- lucide-react
- react
- recharts
- tailwind-css
- typescript
Log in or sign up for Devpost to join the conversation.