Recall.ai: Your Cognitive Safety Net 🧠

"We don't remember days, we remember moments." — Cesare Pavese But for millions living with early-stage Alzheimer's or Mild Cognitive Impairment (MCI), even moments are slipping away. Recall.ai is here to catch them.

💡 Inspiration

The inspiration for Recall.ai came from watching a loved one struggle with the daily frustration of misplaced items. Simple questions like "Where did I leave my reading glasses?" or "Did I take my morning medicine?" became sources of daily anxiety and dependency.

We realized that while Large Language Models (LLMs) are incredibly smart, they lack context of our physical world. Existing solutions often require expensive hardware or 24/7 cloud recording, which raises massive privacy concerns.

We asked ourselves: Can we build a privacy-first "external working memory" that lives entirely in the browser?

🚀 What it does

Recall.ai is a Progressive Web App (PWA) that acts as a secure, temporary memory buffer for your life.

  1. Observes: It uses your device's camera to record a "rolling loop" of your surroundings.
  2. Protects: Video frames are stored locally in RAM (Random Access Memory). Nothing is uploaded to the cloud during recording.
  3. Recalls: When you ask a question (e.g., "Where are my keys?"), the app retrieves relevant frames from the local buffer and sends them to Google Gemini 2.0 Flash for analysis.
  4. Forgets: Old footage is automatically wiped every 15 minutes, ensuring your entire day isn't being archived—only the moments you need.

⚙️ How we built it

We built Recall.ai with a "Privacy-by-Design" architecture, leveraging the speed of the latest Gemini models.

The Stack

  • Framework: Next.js 14 (App Router & Server Actions)
  • AI Model: Google Gemini 2.0 Flash Experimental (chosen for its sub-second multimodal reasoning).
  • Styling: Tailwind CSS with a glassmorphic design system for accessibility and calmness.
  • Deployment: Vercel Edge Network.

The "Rolling Buffer" Engine (The Math)

The core technical challenge was memory management. Browsers crash if you store too much video data. We engineered a custom React Hook, useVideoMemory, that implements a Circular Queue (FIFO) data structure.

We calculate our memory footprint using the following constraint to ensure we never exceed browser limits:

Where we defined our constants as:

  • We capture 1 frame every 3 seconds (Smart Sampling).
  • When frame arrives, frame is instantly deleted from memory ().

The AI Integration

We used Next.js Server Actions to securely communicate with the Gemini API. We prompt-engineered a specific persona for the model:

  • Role: "Warm, patient caretaker."
  • Constraint: "No technical jargon or JSON output."
  • Output: Spatial, natural language (e.g., "I saw them on the red table to your left").

🛑 Challenges we ran into

  1. Browser Memory Leaks: Initially, storing high-resolution Base64 strings crashed mobile browsers within minutes. We solved this by downscaling images to 640x480 and strictly enforcing the circular buffer limit defined in our equations above.
  2. Secure Contexts: Accessing the navigator.mediaDevices API requires a secure context (https://). This made local testing on mobile difficult until we set up local SSL tunnels and eventually deployed to Vercel.
  3. Model Hallucinations: Early versions of the model would sometimes guess locations if it wasn't sure. We refined the systemInstruction to make the AI admit when it hadn't seen an item, rather than making up a location.

🏅 Accomplishments that we're proud of

  • Zero-Persistence Privacy: We successfully built a system where closing the tab instantly wipes all video memory. There is no database, no long-term storage, and no "Big Brother" risk.
  • Gemini 2.0 Integration: We are among the first to leverage the Gemini 2.0 Flash Experimental model for a real-time video recall application, achieving response times under 2 seconds.
  • Accessibility: The UI is designed with high contrast, large touch targets, and a "Flip Camera" feature, making it usable for elderly individuals on any device.

🧠 What we learned

  • Multimodal is the Future: Text-only context is insufficient for assistive tech. The ability of Gemini to "see" and understand spatial relationships (left, right, behind) is a game-changer for accessibility.
  • Privacy is a Feature, Not a Constraint: Limiting our data storage to RAM actually made the app faster and simpler, proving that you don't need massive databases to build useful AI tools.

🔮 What's next for Recall.ai

  • Voice Mode: Implementing the Web Speech API so users can ask questions verbally instead of typing.
  • Offline Support: Using local small-scale models (like Gemini Nano) to perform object detection even without an internet connection.
  • Wearable Integration: Porting the logic to smart glasses for a truly hands-free experience.

[Built with ❤️ for the Gemini API Developer Competition]

Built With

  • artificial-intelligence
  • computer-vision
  • gemini-2.0-flash
  • glassmorphism
  • google-ai-studio
  • google-gemini
  • next.js
  • pwa
  • react
  • tailwind-css
  • typescript
  • vercel
Share this project:

Updates