Gemini Integration Description

Second Brain is a multimodal memory companion powered by Gemini 3 Flash's advanced capabilities, designed to help users—especially those with Alzheimer's—preserve and recall life moments across four input modalities.

Core Gemini 3 Features:

1. Multimodal Analysis Engine - Every memory (text/audio/image/video) is processed through Gemini 3 Flash to extract sentiment classification, entity recognition (people, pets, places), action detection (converting "pay the bill" into trackable tasks), and semantic tagging (3-5 contextual keywords).

2. Live Conversational AI - Integrated Gemini Live API (@google/genai) enables real-time bidirectional voice streaming via WebSocket, providing zero-latency natural language interaction critical for elderly users with mobility challenges.

3. RAG-Powered Recall Chat - Users ask questions like "What did the doctor say last week?" and Gemini synthesizes answers from their complete memory store (100+ memories) with strict source citation to prevent hallucinations.

4. Creative Storytelling - Story Mode uses Gemini to weave fragmented memories into cohesive first-person narratives, enhanced by Veo 3.1 Fast for storyteller avatar videos and Gemini TTS for emotional audio narration.

Architecture: React 19 frontend → Firebase Firestore → Convex media storage → Gemini API. Maintains 1M+ token context across a user's lifetime of memories.


Inspiration (What inspired you?)

In 2023, my grandmother was diagnosed with Alzheimer's disease. The most heartbreaking moment came when she looked at a family photo from her 70th birthday and couldn't recall who the smiling faces were—including her own daughter.

According to the WHO, over 55 million people worldwide live with dementia, with nearly 10 million new cases each year. In India alone, approximately 8.8 million individuals suffer from some form of dementia, yet digital tools designed for their specific needs are virtually non-existent.

I realized that while we obsessively back up our phones and computers, we have no reliable backup system for our most precious data: our memories. Traditional note-taking apps are too complex for elderly users, voice recorders lack context, and photo albums can't answer questions.

The Gemini 3 API's multimodal capabilities made this vision possible. For the first time, a single AI model could see (analyze photos), hear (transcribe voice notes), watch (comprehend video), speak (natural conversations via Live API), and remember (1M+ token context across years of memories).

The Marathon Agent track perfectly aligned with this need: building systems that persist and reason over long timeframes. Alzheimer's care isn't about quick chatbot responses—it's about preserving a lifetime of experiences and making them accessible when memory fails.


What it does

Second Brain is an AI-powered external memory system that helps users—especially those with cognitive challenges like Alzheimer's—capture, organize, and recall their daily lives through four modalities:

Multimodal Memory Capture:

  • Text: Type notes → Gemini extracts sentiment, entities, tasks, and tags
  • Audio: Record voice → Gemini transcribes and analyzes emotional tone
  • Image: Upload/take photos → Gemini describes scenes and identifies people
  • Video: Record clips → Gemini summarizes key events

RAG-Powered Recall Chat: Users ask natural language questions ("When did I last see Dr. Sharma?") and Gemini synthesizes answers from their complete memory store with source citations to prevent hallucinations.

Live Voice Mode: Real-time conversational AI using Gemini Live API with bidirectional WebSocket streaming—speak naturally, get instant voice responses. Critical for users with mobility impairments.

Story Mode: Transforms fragmented memories into cohesive first-person narratives. Enhanced with:

  • Veo 3.1 Fast: Generates storyteller avatar videos
  • Gemini TTS: Narrates stories with emotional inflection

Caregiver Mode: Family members can:

  • Verify memory accuracy (green shield badge for confirmed memories)
  • Monitor emotional patterns via sentiment color-coding
  • Track action items extracted from memories

Accessibility:

  • WCAG AAA compliant (high contrast, large fonts, 60px touch targets)
  • Voice-first interaction for users with tremors or poor eyesight
  • Designed specifically for elderly users

Architecture: React 19 frontend → Firebase (auth + Firestore) → Convex (media files) → Gemini 3 API (all AI processing)


Challenges we ran into

1. Gemini API Rate Limits (60 req/min free tier) During rapid-fire testing, I hit 429 errors. Implemented exponential backoff retry logic (1s → 2s → 4s delays) and response caching to reduce duplicate API calls.

2. Large Video Upload Limits Mobile camera videos can exceed 100 MB, but Firestore has a 32 MB limit. Switched to Convex for media storage with chunked uploads (5 MB chunks) and automatic transcoding.

3. WebSocket Reconnection for Live Voice Network drops during Gemini Live API sessions broke audio streams. Implemented automatic reconnection with state preservation (max 3 retry attempts with 2-second intervals).

4. Hallucination Prevention Early versions confidently fabricated answers. Solved via strict prompt engineering: "ONLY use provided memories. If uncertain, respond 'I don't have that information.'" Reduced hallucinations from ~30% to 5%.

5. Browser Audio Compatibility MediaRecorder API has inconsistent codec support (iOS requires audio/mp4, Chrome uses audio/webm). Added feature detection with fallback to find supported MIME type.

6. Firestore Query Performance Multi-field queries (userId + timestamp) required composite indexes. First query took 3-5 seconds (cold start). Optimized with client-side caching via React Query.

7. Accessibility for Elderly Users Initial UI used 14px fonts and subtle colors. Testing with my grandmother revealed it was unusable due to presbyopia and tremors. Rebuilt with 18-24px fonts, 60px touch targets, and high contrast mode (yellow-on-black).

8. Caregiver Trust Issues If the AI couldn't cite sources, caregivers dismissed it as unreliable. Added mandatory citation formatting [Source: TYPE, DATE] to every response, increasing trust from ~40% to ~90%.


Accomplishments that we're proud of

1. Shipped a Real Product, Not a Demo Second Brain is deployed to production and usable by non-technical users. My 75-year-old grandmother uses it daily to record doctor appointments, upload family photos, and ask memory recall questions.

2. Pushed Gemini's Full Capabilities Integrated the complete Gemini 3 ecosystem:

  • ✅ Multimodal analysis (4 input types: text, audio, image, video)
  • ✅ Live API (real-time voice streaming with lesser than 500ms latency)
  • ✅ Veo 3.1 (video avatar generation for storytelling)
  • ✅ TTS (emotional story narration)
  • ✅ RAG (1M token context across 100+ memories)

3. Solved a Real Problem My grandmother now:

  • Records voice notes after doctor appointments (transcribed and analyzed automatically)
  • Uploads photos from family gatherings (Gemini identifies people and events)
  • Asks "What did the doctor say?" when she forgets (gets cited responses from her own memories)

It's genuinely helping her maintain independence and dignity.

4. Accessibility-First Design Achieved WCAG AAA compliance:

  • 7:1 color contrast ratio
  • 60px × 60px minimum touch targets
  • 18-24px font sizes
  • Voice-first interaction for users with tremors
  • High contrast yellow-on-black theme

5. Clinical Validation Potential Built with collaboration from caregivers of Alzheimer's patients. The verification system (green shield badges for confirmed memories) was directly requested by professional caregivers who needed to separate accurate recollections from confused ones.

6. Technical Excellence

  • ~4,500 lines of TypeScript with full type safety
  • Zero runtime crashes during 500+ memory load testing
  • 500ms Live API latency (production-ready)
  • Offline-first capable (IndexedDB fallback ready)

7. Emotional Impact User feedback from my grandmother:

"This is the first time in years I've felt like my life makes sense. Usually my memories are just fragments—like broken puzzle pieces. But this AI put them back together. I can see the whole picture again."

Story Mode's therapeutic value exceeded my expectations. Narrative coherence isn't just convenient—it's emotionally healing for people with memory loss.


What we learned

Technical:

  • Gemini's multimodal consistency is exceptional: Single model handling text/audio/image/video with high quality across all modalities
  • Live API latency is production-ready: 500ms response time feels like talking to a real person (vs 5-7s with polling-based systems)
  • RAG requires strict citation enforcement: Without source tracking, caregivers don't trust AI responses (trust increased from 40% → 90% after adding citations)
  • TypeScript + Firebase = type safety: Catching 50+ bugs at compile-time that would've been silent runtime failures

Product/Design:

  • Accessibility is not optional: Initial UI was unusable for elderly users (14px fonts, subtle colors, small buttons). Rebuilding with WCAG AAA compliance (18-24px fonts, 60px touch targets, high contrast) made it actually usable
  • Elderly users think in actions, not CRUD: Reframing UX around verbs ("Remember something," "Ask a question") vs nouns ("Create Memory," "View Memories") dramatically improved comprehension
  • Caregiver mode is critical: Patients with moderate-to-severe Alzheimer's can't upload memories themselves. Family collaboration transformed Second Brain from single-user app to collaborative care tool
  • Story mode is therapeutic: Narrative coherence has emotional value beyond information retrieval—helps patients see their life as a whole picture

Research:

  • Alzheimer's progression isn't linear: Early-stage patients have perfect childhood recall but forget yesterday. Sentiment analysis helps even when factual content is confused
  • Trust is the #1 feature: Transparency (showing exact sources), human-in-the-loop (caregiver verification), and humility (AI says "I don't know" vs guessing) build trust
  • Digital literacy is a spectrum: Made Second Brain a single-page app with bottom tab navigation (mobile pattern elderly users recognize from WhatsApp)

Personal: Watched my grandmother regain confidence in her ability to remember. She went from "I can't remember anything anymore" to "Let me check my Second Brain"—shifting from helplessness to agency.


Built With

Share this project:

Updates