💡 Inspiration: Solving the "Last Meter" Problem

As a Lead Systems Engineer , I don't believe in building tech for trophies; I believe in solving root causes.

While researching for this project, I analyzed the MSF 2024 Disability Trends Report and found a startling statistic: 33.3% of persons with disabilities avoid community participation due to difficulties in travelling.

GPS can guide a visually impaired person to a building, but it fails at the "Last Meter"—finding the door handle, avoiding a wet floor sign, or locating a specific chair in a crowded cafe. Existing solutions often rely on simple object detection (YOLO), which shouts generic labels like "Person" or "Cup" without context.

We asked: What if an AI could not just "see," but "understand" and "navigate" like a human guide?

This inspired Gemini-Cortex, a multimodal wearable that gives the visually impaired Agentic Vision—the ability to actively query, reason, and remember their environment.

🏗️ What it does

Gemini-Cortex is a wearable AI agent that acts as a "Visual Pre-frontal Cortex" for the blind.

Real-Time Navigation: It describes obstacles with clock-face directions (e.g., "Wet floor sign at 2 o'clock, 1 meter away").

Agentic Reading: It can zoom in to read fine print on medicine bottles or menus when asked.

Visual Memory: It remembers where users leave their belongings. You can ask, "Where did I leave my brown wallet?" and it will recall the last seen location.

Zero-Latency Conversation: Using Silero VAD + OpenAI Whisper and Cartesia Sonic 3(Mouth), it converses naturally with the user in about 5 seconds

⚙️ How we built it We architected a distributed system to balance Edge Speed and Cloud Intelligence.

The Architecture: "The Cortex Stack"

We split the processing into four distinct layers:

Layer 1: The Body (Edge Reflexes)

Hardware: Raspberry Pi 5 with a custom 3D-printed chest mount (honeycomb structure for passive cooling).

Layer 2: The Brain (Gemini 3 Agent)

This is the core. We utilize the Gemini 3 Flash API for its multimodal reasoning.

Agentic Vision: We enabled tools='code_execution', allowing Gemini to write and run Python code to analyze complex visual data on the fly.

Prompt Engineering: We inject a "Navigator Persona" system prompt that helps with the warnings.

Layer 3: The Interface (Voice Loop)

Input: OpenAI Whisper (Locally) for real-time speech-to-text with Voice Activity Detection (VAD).

Output: Cartesia Sonic, achieving ultra-low latency TTS so the user doesn't wait in silence.

Layer 4: Memory (Hippocampus)

We implemented a local SQLite database (memory.db) to log detected objects with timestamps. This allows for "Time-Travel Queries" regarding lost items.

Challenges we ran into

Hallucinations: Early versions of the model would "invent" objects. We fixed this by adjusting the temperature to 0.1 and refining the System Prompt to prioritize "Safety Critical" accuracy.

Power Management: The Pi 5 is power-hungry. We designed a "split-weight" wearable system where the heavy battery sits in the pocket, connected via a hidden USB-C cable, keeping the chest unit light (~80g).

🏆 Accomplishments that we're proud of "Glass-to-Ear" Latency: We achieved a response time (from seeing an object to speaking its name) of under 6 seconds using Gemini Flash.

The "Find My Wallet" Demo: Successfully integrating the memory layer to recall specific personal items (like "Haziq's Brown Wallet") instead of generic objects from layer 4 (Sqlite System, like RAG but not)

Hardware Design: Designing and printing a comfortable, ventilated mount that can be worn for hours without fatigue.

What we learned Agentic AI is the future of Assistive Tech: Static models (like YOLO) are dead. Dynamic models that can "look closer" or "remember" are the only way to solve real human problems.

Systems Engineering > Coding: The code was easy; the challenge was architecting the flow of data between Pi, Laptop, and API to ensure stability.

Pain First, Rest Later: Building a hardware prototype in 8 days required sleepless nights, but the result—a device that can help the 45,000+ persons with disabilities in Singapore—was worth the renovation of our character.

What's next for Gemini-Cortex

On-Device Migration: Moving the "Brain" from the laptop to a wearable AI accelerator (like Hailo-8 or Coral) for offline safety.

Tactile Feedback: Adding haptic motors to the chest strap to "buzz" when an obstacle is imminent (Layer 1 Reflexes).

Competition: We are preparing to showcase this at the Young Innovators Awards (YIA) 2026 and the MOE Innovation Awards in Singapore

Built With

  • asyncio
  • bluetoothearphones
  • cartesiasonic-3
  • fastapi
  • kokorotts
  • numpy
  • openaiwhisper
  • opencv
  • picamera2
  • pillow
  • psutil
  • pyaml
  • python
  • python-dotenv
  • raspberrypi
  • rich
  • silerovad
  • sqlite
Share this project:

Updates