💡 Inspiration: Solving the "Last Meter" Problem
As a Lead Systems Engineer , I don't believe in building tech for trophies; I believe in solving root causes.
While researching for this project, I analyzed the MSF 2024 Disability Trends Report and found a startling statistic: 33.3% of persons with disabilities avoid community participation due to difficulties in travelling.
GPS can guide a visually impaired person to a building, but it fails at the "Last Meter"—finding the door handle, avoiding a wet floor sign, or locating a specific chair in a crowded cafe. Existing solutions often rely on simple object detection (YOLO), which shouts generic labels like "Person" or "Cup" without context.
We asked: What if an AI could not just "see," but "understand" and "navigate" like a human guide?
This inspired Gemini-Cortex, a multimodal wearable that gives the visually impaired Agentic Vision—the ability to actively query, reason, and remember their environment.
🏗️ What it does
Gemini-Cortex is a wearable AI agent that acts as a "Visual Pre-frontal Cortex" for the blind.
Real-Time Navigation: It describes obstacles with clock-face directions (e.g., "Wet floor sign at 2 o'clock, 1 meter away").
Agentic Reading: It can zoom in to read fine print on medicine bottles or menus when asked.
Visual Memory: It remembers where users leave their belongings. You can ask, "Where did I leave my brown wallet?" and it will recall the last seen location.
Zero-Latency Conversation: Using Silero VAD + OpenAI Whisper and Cartesia Sonic 3(Mouth), it converses naturally with the user in about 5 seconds
⚙️ How we built it We architected a distributed system to balance Edge Speed and Cloud Intelligence.
The Architecture: "The Cortex Stack"
We split the processing into four distinct layers:
Layer 1: The Body (Edge Reflexes)
Hardware: Raspberry Pi 5 with a custom 3D-printed chest mount (honeycomb structure for passive cooling).
Layer 2: The Brain (Gemini 3 Agent)
This is the core. We utilize the Gemini 3 Flash API for its multimodal reasoning.
Agentic Vision: We enabled tools='code_execution', allowing Gemini to write and run Python code to analyze complex visual data on the fly.
Prompt Engineering: We inject a "Navigator Persona" system prompt that helps with the warnings.
Layer 3: The Interface (Voice Loop)
Input: OpenAI Whisper (Locally) for real-time speech-to-text with Voice Activity Detection (VAD).
Output: Cartesia Sonic, achieving ultra-low latency TTS so the user doesn't wait in silence.
Layer 4: Memory (Hippocampus)
We implemented a local SQLite database (memory.db) to log detected objects with timestamps. This allows for "Time-Travel Queries" regarding lost items.
Challenges we ran into
Hallucinations: Early versions of the model would "invent" objects. We fixed this by adjusting the temperature to 0.1 and refining the System Prompt to prioritize "Safety Critical" accuracy.
Power Management: The Pi 5 is power-hungry. We designed a "split-weight" wearable system where the heavy battery sits in the pocket, connected via a hidden USB-C cable, keeping the chest unit light (~80g).
🏆 Accomplishments that we're proud of "Glass-to-Ear" Latency: We achieved a response time (from seeing an object to speaking its name) of under 6 seconds using Gemini Flash.
The "Find My Wallet" Demo: Successfully integrating the memory layer to recall specific personal items (like "Haziq's Brown Wallet") instead of generic objects from layer 4 (Sqlite System, like RAG but not)
Hardware Design: Designing and printing a comfortable, ventilated mount that can be worn for hours without fatigue.
What we learned Agentic AI is the future of Assistive Tech: Static models (like YOLO) are dead. Dynamic models that can "look closer" or "remember" are the only way to solve real human problems.
Systems Engineering > Coding: The code was easy; the challenge was architecting the flow of data between Pi, Laptop, and API to ensure stability.
Pain First, Rest Later: Building a hardware prototype in 8 days required sleepless nights, but the result—a device that can help the 45,000+ persons with disabilities in Singapore—was worth the renovation of our character.
What's next for Gemini-Cortex
On-Device Migration: Moving the "Brain" from the laptop to a wearable AI accelerator (like Hailo-8 or Coral) for offline safety.
Tactile Feedback: Adding haptic motors to the chest strap to "buzz" when an obstacle is imminent (Layer 1 Reflexes).
Competition: We are preparing to showcase this at the Young Innovators Awards (YIA) 2026 and the MOE Innovation Awards in Singapore
Log in or sign up for Devpost to join the conversation.