Inspiration

The inspiration for SentiLens came from a simple but profound observation: while AI has made massive leaps in understanding static images, the "last mile" of accessibility requires real-time, low-latency intuition. I wanted to build a tool that doesn't just describe a photo when asked, but acts as a proactive companion - one that helps a visually impaired user spot a sale at the grocery store, read a medical bill before it's due, or notice a traffic hazard before it becomes a danger.

What it does

SentiLens is a voice-first assistive application that uses live video streaming to provide environmental awareness and task-specific help.

  • Finder (Grocery) Mode: Identifies products, prices, and nutritional info in real-time.
  • Document Mode: Performs high-precision OCR to read, summarize, and answer questions about physical mail or documents.
  • Medication Mode: Validates medication labels against official database info for safety and dosage clarity.
  • Environment Mode: Proactively monitors for safety-critical objects like traffic lights, obstacles, and vehicles.
  • Continuous Awareness: Unlike static apps, SentiLens maintains a "World Memory," allowing it to remember what it saw earlier and relate it to the user's current goals.

How I built it

  • Core AI: Powered by Google Gemini 3.0 Flash for rapid multimodal reasoning.
  • Real-Time Pipeline: Built on the Gemini Multimodal Live API via WebSockets, enabling bidirectional audio and video streaming with sub-second latency.
  • Frontend: A high-performance Next.js application designed for mobile-first, voice-first interaction.
  • Infrastructure: Uses Google Cloud Firestore for session-based world memory and Firebase Hosting for rapid deployment.
  • Safety Heuristics: Custom business logic (Orchestrator) that prioritizes safety hazards over general observations.

Challenges I ran into

  • Latency vs. Precision: Balancing the need for "instant" voice feedback from the Live API with the need for high-resolution "Deep Dives" via standard Gemini Flash endpoints.
  • Information Overload: Avoiding "chattiness." I had to implement a cooldown system and priority-based filtering so the AI only speaks when it truly sees something relevant or dangerous.
  • Environmental Noise: Managing audio echo cancellation and noise suppression in busy grocery store environments to ensure the AI can always hear the user.

Accomplishments that I'm proud of

  • Proactive Intuition: I successfully moved from "Question-Answer" to "Proactive Observation." The AI often tells the user what they need to know before they even ask.
  • Sub-Second Response: Achieving near-human latency in voice interactions using the latest multimodal streaming capabilities.
  • Robust Grounding: Implementing a verification layer that checks AI observations against known "facts" to reduce hallucinations in critical scenarios like medication dosage.

What I learned

  • Multimodal Context is King: The user's goal (e.g., "Find healthy cereal") completely changes how the AI should "see" the world.
  • Voice UI is Subtle: Tiny "earcons" (chimes and sounds) are often more effective than words for signaling that the AI has spotted something new.
  • Memory Matters: An AI that "forgets" it saw an obstacle 5 seconds ago is a safety risk; persistent session memory is non-negotiable for assistive tech.

What's next for SentiLens

  • Full Vertex AI Migration: Leveraging enterprise-grade endpoints for improved reliability and scale.
  • Indoor Navigation: Integrating indoor mapping APIs to guide users through complex store layouts.
  • Personal Knowledge Graph: Allowing the AI to remember a user's specific pantry items or home layout across multiple sessions.
  • Offline Fallback: Implementing local ML models for basic hazard detection when internet connectivity is spotty.

Built With

Share this project:

Updates