Smart Agentic assistive device

INSPIRATION

My neighbor's grandfather stopped leaving his house last year. No accident. No drama. He just quietly stopped going outside. He memorized every step from his bed to his front door — fourteen steps — and counted them every morning just to feel safe. He had a smartphone in his pocket the whole time. A device with a camera, processor, and internet connection. Useless to him because nobody built the missing piece.

285 million people are visually impaired. OrCam, Envision, eSight — remarkable technology, ₹3,00,000 minimum. For 70% of India's blind population living below the poverty line, that price is not just expensive — it is a closed door.

I went to the local electronics shop. Spent ₹540. Two sensors. One buzzer. One Arduino. The technology was already in his pocket. I built the bridge.

WHAT IT DOES

Say "My Eye" followed by any command. The AI responds instantly. The system transforms a standard smartphone and ₹540 worth of components into a complete hands-free wearable assistive ecosystem — worn around the neck, camera always facing forward, fully voice-controlled.

1. Agentic Multi-Step Reasoning (Core Innovation) The agent thinks before acting. Say "help me find my medicine." It checks saved local memory first — if the object location is already stored, it speaks the answer instantly and stops. No camera opened. No API call wasted. If not in memory, it opens the camera and scans the current scene. Still not found — it triggers a full 360° room scan, capturing frames as you slowly turn. Each step result is stored in session memory so the agent never repeats completed work. A 3-second answer when data exists. A complete room search only when truly needed.

2. Real-Time Obstacle Detection (Works Fully Offline) Two ultrasonic sensors provide complete spatial awareness with zero cloud delay. The top sensor detects chest-level obstacles — walls, poles, people. The bottom sensor is angled 30° downward to detect ground-level hazards — curbs, steps, small objects. An Arduino buzzer fires instantly: 3 short beeps for obstacles, 1 long beep for stairs. No internet required. Zero latency.

3. Traffic Light Detection with Dual Verification Designed for Indian roads where pedestrian signals are absent. Two methods run in parallel simultaneously — Groq AI vision identifies the light color while HSV color analysis (OpenCV) confirms it independently. Both must agree before speaking. Red light detected → "Stop, do not cross." Green light detected → "Safe to cross now." Response under 2 seconds. If either method detects red, safety overrides. Live Mode runs continuously in the background and only speaks when the light changes.

4. Indoor Navigation — Dual-Index System A custom algorithm that fuses voice commands with real-time visual verification — works even in blurry corridors where GPS fails completely. Every existing indoor navigation system (NavCog, BLE beacons, RFID) requires someone to physically install hundreds of beacons and spend thousands of rupees in advance. VisionAssist AI works with pre-recorded routes, zero infrastructure setup needed. Say "Go to room 204" — the system guides you step by step.

5. Document Memory with n8n + Pinecone Blind users upload Aadhar cards, prescriptions, and ID proofs once. Say "My Eye... show my Aadhar number" — the AI searches the vector database and reads the answer instantly. n8n extracts text from the document, splits it into chunks, converts to OpenAI embeddings, and stores in Pinecone. Retrieval is instant and fully voice-driven. No carrying physical documents. No asking someone else.

6. Object Finder 360° Captures multiple frames during a slow rotation. Sends all frames together for full-room spatial analysis. Returns exact direction and distance of the object. Always checks memory before scanning — skips the camera entirely if the location was already saved. "Where are my keys?" → AI scans, finds keys on the table, stores location in memory. Ask again later → instant answer, zero API call.

7. Food Identifier Names every item on the plate, estimates quantity, counts plates and glasses, and states meal type. Live Mode runs continuously and only speaks when something on the plate changes — prevents voice fatigue.

8. Page and Medicine Reader Auto-detects when a document appears in frame. No button press needed. Reads books word for word. For medicine labels, extracts name, usage, and dosage clearly — critical for blind users managing medications alone.

The app also includes face recognition, currency detection, scene description, stair safety, emergency SOS with live GPS via Twilio SMS + SMTP email, and outdoor routing via Leaflet.js with OpenStreetMap. Works in Hindi, English, and Hinglish.

HOW WE BUILT IT

Built entirely from scratch using Python and Flask for the backend, and HTML5, CSS3, and Vanilla JavaScript for the frontend. No frameworks. No paid platforms.

AI Vision — Groq API (Llama 4 Scout 17B) as primary model, Google Gemini 1.5 Flash as automatic fallback for object finding and complex scenes. 3-key rotation system with exponential backoff — if one Groq key hits rate limits, the next takes over instantly. No request fails silently.

Computer Vision — OpenCV for frame processing and HSV color analysis, YOLOv8 for real-time object detection, DeepFace with Facenet for face recognition, Tesseract OCR for text extraction.

Agent Brain — Custom multi-step reasoning loop in Python. Each session holds an AgentSessionMemory object that stores results from every completed step. Agent reads this before each decision so it never re-runs completed work. Sessions are fully thread-safe with lock-protected reads and writes.

Speech — Web Speech API handles wake word detection, continuous voice recognition, and text-to-speech simultaneously in the browser with zero latency. Microphone pauses exactly when the agent speaks and restarts automatically after speech ends — prevents feedback loops.

Document Intelligence — n8n cloud workflow + Pinecone vector database + OpenAI text-embedding-3-small. Documents uploaded once, retrieved forever by voice.

Hardware — Arduino clone (ATmega328P) + 2x HC-SR04 ultrasonic sensors + active buzzer. Connected to phone via OTG cable, powered by the phone itself — no separate battery. Total hardware cost: ₹540. Complete device with smartphone: 280 grams.

Maps — Leaflet.js + OpenStreetMap for outdoor routing. Indoor map generated dynamically from a home walkthrough video using AI room detection.

Alerts — Twilio for SMS, SMTP for email, both triggered simultaneously on SOS with live GPS coordinates.

Deployment — Backend deployed on Render. Frontend accessed from any smartphone browser. No app installation required.

CHALLENGES WE RAN INTO

Traffic light latency — AI takes 1.5–2.5 seconds. For crossing a road, that is too slow. Solved with Live Mode: runs continuously in the background, only speaks when light color changes. The answer is always ready before the user asks.

Traffic light accuracy in low light — Pure AI vision failed in shadows and glare. Added HSV color analysis as a parallel verification layer. If either method detects red, it announces red. Safety always overrides confidence.

Agent state across HTTP requests — Flask is stateless by default. Built a full session store with thread-safe locks so the agent remembers what it already checked across multiple API calls without ever repeating a completed step.

Indoor navigation on unusual layouts — AI returned unstructured text for irregular homes. Added structured fallback extraction — a secondary parser extracts room names from keywords and builds a valid navigation graph even from imperfect AI output.

API rate limits under heavy use — Built exponential backoff with 3-key rotation. Each key gets cooldown time proportional to failure count and resets automatically after cooldown expires.

Voice collision — TTS speaking while mic was listening caused feedback loops. Solved by pausing the microphone precisely when the agent speaks and auto-restarting it after speech ends.

ACCOMPLISHMENTS WE'RE PROUD OF

Smart agent reasoning that stops early — if the answer is in memory, no camera is ever opened. Saves time, saves API calls, feels genuinely intelligent.

Dual Mode across every single feature — Normal for accuracy, Live for continuous awareness. No other accessibility app has this combination.

Dual verification for traffic lights — AI vision plus HSV color analysis must agree. Safety-first architecture that works in real Indian road conditions.

8+ features in one voice app, built solo — most accessibility apps have 3–4 features with a full team.

Zero software cost — free tiers only. No credit card. Works globally on any smartphone browser.

Real user validation — a visually impaired tester said: "I don't need someone with me anymore. I can walk alone, read my prescription, know what I'm eating. That's freedom."

₹540 hardware cost — 0.18% of OrCam. Proven on a real user.

WHAT WE LEARNED

Safety must always override speed. Smart reasoning that stops early is better than always doing more. Voice-first design is deeply underrated — it changes everything for users who cannot look at a screen. Free tiers are sufficient to build production-grade tools. Never trust AI alone for safety-critical decisions — always verify with a second independent method.

WHAT'S NEXT

The planned Android app eliminates the shared server entirely — each user or NGO volunteer creates one free Groq account during setup, enters their key once, and all AI calls go directly from their phone to Groq. No server needed. No shared limits. Groq's free tier gives 1,000 requests per day — a blind user doing 2–3 hours of outdoor activity needs roughly 50–100 requests. The free limit is more than sufficient for real daily use.

Additional roadmap: on-device TinyML (TensorFlow Lite) for offline object detection and OCR, local encrypted document storage (SQLite + AES) replacing cloud Pinecone for privacy, Bluetooth module (HC-05) replacing the OTG cable, medication expiry alerts, more Indian languages, and crowdsourced hazard alerts from the blind community.

NGO Deployment Scale:

10 users: ₹5,400 hardware only vs ₹30 lakh for OrCam
100 users: ₹54,000 hardware only vs ₹3 crore for OrCam
500+ users: ADIP Scheme eligible (₹20,000/person/year government allocation)

The goal: VisionAssist AI pre-installed on every budget smartphone in India. Zero ongoing cost. Blind people lead independent lives.

"Technology should lift people up, not price them out."

BUILT WITH

Python · Flask · Groq API (Llama 4 Scout 17B) · Google Gemini 1.5 Flash · OpenCV · YOLOv8 · DeepFace · Facenet · Tesseract OCR · Web Speech API · Leaflet.js · OpenStreetMap · Twilio · SMTP · n8n · Pinecone · OpenAI Embeddings · Arduino · HTML5 · CSS3 · Vanilla JavaScript

TRY IT OUT

🌐 Live App: https://global-blind-device.onrender.com/ 🎥 Real blind Video: https://youtu.be/LY4OkuAPpD8?si=RMLXmWiQU4U1KrV7 💻 GitHub: https://github.com/clod9344-cloud/Global_blind_device "I don't need someone with me anymore. That's not convenience. That's freedom."

Built With

Updates

Prince Jha started this project — May 22, 2026 09:01 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.