Smart agentic assistive device

INSPIRATION

My neighbor's grandfather stopped leaving his house last year. No accident. No drama. He just quietly stopped going outside. He memorized every step from his bed to his front door — fourteen steps — and counted them every morning just to feel safe.

He had a smartphone in his pocket the whole time. Nobody built the missing piece.

285 million people are visually impaired. The tools that work cost ₹3,00,000 ($3,600). For most blind families in India, that is an entire year's income. I spent ₹540 ($6.50) and built the same independence.

WHAT IT DOES

VisionAssist AI is a voice-controlled wearable worn around the neck. The phone camera faces forward. The user says "My Eye" followed by any command. The AI responds instantly. No screen. No hands. No helper needed.

Track: AI That Actually Helps People (Track 03)

Core Feature 1 — Agentic Multi-Step Reasoning

This is the heart of the project and what separates it from a simple API call.

When a user says "find my medicine," the agent does not immediately open the camera. It first checks local session memory — if the object was found before, it speaks the answer instantly and stops. No camera opened. No redundant API call made. This saves time and reduces unnecessary requests significantly.

If not in memory, it opens the camera and scans the current frame. Still not found — it triggers a full 360° room scan, capturing frames as the user slowly turns. Every completed step is stored so the agent never repeats work already done.

A 3-second answer when data exists in memory. A complete room search only when truly needed. This is not a chatbot. This is a reasoning system that decides what to do, does the minimum required, and remembers what it learned — exactly like a real assistant would.

(The planned Android app will give each user their own free Groq API key — eliminating the shared server entirely, making memory-first reasoning even faster with zero shared rate limits.)

Core Feature 2 — Real-Time Obstacle Detection (Fully Offline)

Two HC-SR04 ultrasonic sensors mounted on the neck pouch. The top sensor detects chest-level obstacles — walls, poles, people. The bottom sensor is angled 30° downward to detect ground-level hazards — curbs, steps, small objects on the floor.

An Arduino buzzer fires instantly: 3 short beeps for obstacles, 1 long beep for stairs.

No internet. No cloud. No latency. This is the only feature that works completely offline — in a basement, in a rural village with no signal, during a power cut. Safety cannot depend on a server being up. That is why obstacle detection runs entirely on the Arduino with zero cloud dependency.

Total hardware cost: ₹540. Powered entirely by the phone via OTG cable — no separate battery needed.

Core Feature 3 — Traffic Light Detection with Dual Verification

This feature is built around one honest observation: blind pedestrians do not look at the traffic light directly — they judge safety by whether cars are moving or stopped. When cars stop at a red light, it is safe to cross.

VisionAssist AI watches the traffic light that the cars are obeying. Two methods run in parallel — Groq AI vision identifies the light color, while HSV color analysis via OpenCV independently confirms it. Both must agree before the system speaks. If either detects red, safety overrides confidence.

Red light → cars are stopped → "Safe to cross now." Green light → cars are moving → "Stop, do not cross."

Live Mode runs continuously in the background and only announces when the light changes — so the answer is always ready before the user asks. Response under 2 seconds.

Dual verification exists because single-model AI fails in shadows, glare, and low light. A second independent method catches what the first misses. For a decision involving road safety, one model is never enough.

Core Feature 4 — Indoor Navigation — Dual-Index System

Every existing indoor navigation solution — NavCog, BLE beacons, RFID — requires someone to physically install infrastructure in the building first. Hundreds of beacons. Thousands of rupees per building. Only works where someone already paid to set it up.

VisionAssist AI uses a dual-index system: voice commands fused with real-time visual verification from the phone camera. Pre-recorded routes are stored once. The system guides step by step through corridors even where GPS signal is completely absent.

Say "Go to room 204" — the system guides the user turn by turn using the stored route combined with live visual confirmation from the camera. Zero physical infrastructure. Zero setup cost per building.

HOW WE BUILT IT

Built entirely from scratch. Python and Flask for the backend. HTML5, CSS3, and Vanilla JavaScript for the frontend. Runs in any smartphone browser — no app installation required.

AI — Groq API (Llama 4 Scout 17B) as primary model, Google Gemini 1.5 Flash as automatic fallback for complex scenes. 3-key rotation system with exponential backoff — if one Groq key hits rate limits, the next takes over instantly. No request fails silently.

Computer Vision — OpenCV for HSV color analysis and frame processing, YOLOv8 for real-time object detection.

Agent Brain — Custom multi-step reasoning loop in Python. Each session holds an AgentSessionMemory object storing results from every completed step. Agent reads memory before every decision — never repeats completed work. Fully thread-safe with lock-protected reads and writes across concurrent requests.

Speech — Web Speech API handles wake word detection, continuous voice recognition, and text-to-speech simultaneously in the browser. Microphone pauses exactly when the agent speaks and restarts automatically after speech ends — prevents audio feedback loops.

Hardware — Arduino ATmega328P clone + 2× HC-SR04 ultrasonic sensors + active buzzer. ₹540 total. 280 grams including smartphone. Powered by phone via OTG cable.

Deployment — Backend deployed on Render free tier. Frontend accessed from any smartphone browser. Zero installation required.

CHALLENGES WE RAN INTO

Traffic light latency — AI vision takes 1.5–2.5 seconds per frame. Too slow for a live road crossing decision. Live Mode solved this — runs continuously in the background, speaks only when the light changes. The answer is always pre-computed before the user asks.

Traffic light accuracy in low light — Single AI model failed in shadows and glare. HSV color analysis added as a parallel independent verification layer. Two methods must agree. If either detects red, the system announces red. Safety always overrides confidence.

Agent memory across stateless HTTP — Flask does not persist state between requests by default. Built a full thread-safe session store so the agent remembers every completed step across multiple API calls without ever repeating them.

Voice collision — Microphone and speaker active simultaneously caused feedback loops. Fixed by pausing the microphone precisely when TTS speaks and auto-restarting after speech ends.

API rate limits under heavy use — Built exponential backoff with 3-key rotation. Each key gets cooldown time proportional to failure count and resets automatically after cooldown expires.

WHAT WE LEARNED

Safety cannot depend on a single model being correct — always verify with a second independent method. Reasoning that checks memory before opening the camera is smarter than always doing more. Voice-first design is the only truly accessible design for users who cannot look at a screen. Free tiers are fully sufficient to build tools that reach real people. The best technology is the technology that actually reaches people.

WHAT'S NEXT

The planned Android app eliminates the shared server entirely — each user creates one free Groq account during setup, enters their key once, and all AI calls go directly from their phone to Groq. No shared server. No shared limits. Groq's free tier provides 1,000 requests per day — a blind user during 2–3 hours of outdoor activity needs roughly 50–100 requests. The free limit covers full daily use with margin.

On-device TinyML via TensorFlow Lite for offline object detection. Local encrypted document storage replacing cloud dependencies. Bluetooth module replacing OTG cable. More Indian regional languages. Crowdsourced hazard alerts from the blind community.

At ₹540 hardware cost per user — NGOs can deploy to 500 users for the cost of a single OrCam device.

"Technology should lift people up, not price them out."

BUILT WITH

Python · Flask · Groq API (Llama 4 Scout 17B) · Google Gemini 1.5 Flash · OpenCV · YOLOv8 · Web Speech API · Arduino · HTML5 · CSS3 · Vanilla JavaScript

TRY IT OUT

🌐 Live App: https://global-blind-device.onrender.com/ 🎥 Real Blind User Demo: https://youtu.be/LY4OkuAPpD8?si=RMLXmWiQU4U1KrV7 💻 GitHub: https://github.com/clod9344-cloud/Global_blind_device

"I don't need someone with me anymore. That's not convenience. That's freedom."

Built With

Updates

Prince Jha started this project — Jun 22, 2026 12:15 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.