AI Smart Glasses

Rag documents scanner

VisionAssist AI: Independence Through Intelligence

Inspiration

My grandmother once spent 45 minutes searching for her glasses. They were on her head the whole time. She laughed about it, but I saw the frustration beneath the smile.

Then last year, a visually impaired friend told me he avoids going to new restaurants. He can't read menus. He can't identify what's on his plate. He can't eat alone with confidence.

That hit me.

Globally, over 285 million people are visually impaired. Most assistive technology treats them as recipients of charity — complicated apps, constant dependence on sighted help, and an underlying message that they need "saving."

I rejected that model completely. VisionAssist AI doesn't "help" blind people. It empowers them. One voice-controlled system. Zero dependence. Complete independence.

What It Does

VisionAssist AI turns any smartphone into an intelligent personal guide. Say "My Eye" followed by what you need, and the system responds instantly.

Find lost objects — Point your phone around the room and ask "where are my keys?" The AI scans and tells you exactly where. Left side. On the floor. Near the sofa.

Read any page — Hold a book or document in front of the camera. The system detects the page automatically and reads every word aloud, exactly as written.

Identify food — Point at your plate. The AI identifies every dish, estimates quantities, counts plates and glasses, and tells you the meal type — breakfast, lunch, dinner, or snack.

Detect stairs — Analyzes stairs ahead and tells you step count, direction (up or down), handrail position, surface condition, and any hazards.

Read traffic lights — Uses dual verification (Groq AI + color analysis) to identify red, green, or yellow with 94% accuracy.

Recognize faces — Learns to identify family members and friends. Stores faces locally. Privacy preserved.

Navigate indoors — Walk through your home once while recording video. The AI builds a complete map. Then say "navigate to kitchen" and get step-by-step voice directions.

Navigate outdoors — Full GPS routing with turn-by-turn voice guidance.

Sense obstacles — Arduino sonar sensor measures distances in real-time. Warns you when objects are within 60cm.

Emergency SOS — One voice command triggers SMS and email alerts with your exact GPS location to all emergency contacts.

Key Innovations

Dual Mode Intelligence — Every feature has two modes. Normal mode captures multiple frames over 3-4 seconds and gives one detailed response. Perfect for accuracy. Live mode runs continuously, speaking only when something changes — essential for walking users who need constant awareness without constant talking.

Memory System — Users can save objects with voice and photos. "My Eye, remember this — my red keys are on the kitchen table." Later, "My Eye, where are my keys?" The AI recalls the location and visually confirms using saved photos.

How We Built It

Backend runs on Python with Flask. AI orchestration uses Groq's Llama 4 Scout for vision tasks (under 2 second response), Google Gemini as fallback for multi-frame analysis. YOLOv8 runs locally for object and document detection. DeepFace handles face recognition. SIFT identifies currency notes.

Frontend is clean HTML, CSS, and vanilla JavaScript — designed accessibility-first with large buttons, high contrast, and screen reader support.

Hardware integration uses Arduino Uno with HC-SR04 ultrasonic sensor. PySerial handles communication. Twilio and SMTP manage emergency alerts. OpenStreetMap powers navigation.

Challenges We Faced

Latency was the biggest hurdle. Groq takes 1.5-2.5 seconds per inference. For a blind user walking toward stairs, that's an eternity. We solved this with dual mode — normal for precision, live for continuous awareness.

Indoor map generation broke repeatedly. Groq sometimes returned incomplete JSON. We wrote regex patterns to extract valid structures and built fallback functions.

Voice command collision drove us crazy. Both accessibility mode and standard mode triggered simultaneously. We added mode flags to disable one recognizer when the other is active.

API rate limits hit us hard — Groq allows 30 requests per minute, Gemini 60. We made Gemini primary for expensive operations and built retry logic with exponential backoff.

What We Learned

Start simple, then scale. Our first prototype just said "I see a chair, a person, a book." Testers found that magical. Everything else came later.

Accessibility features benefit everyone. The large buttons, high contrast UI, and voice alternatives we built for visually impaired users turned out to be loved by sighted users too.

Hardware is unpredictable. The Arduino disconnects. Camera permissions get dismissed. GPS takes 30 seconds. Build fallbacks for everything and never assume a component is working.

What's Next

PMPML Bus Integration (Pune) — We are integrating Pune's bus route system into the navigation. Users will say "My Eye, take me to Swargate" and receive complete guidance — which bus, which stop, when to get off, walking directions on both ends. Transforming VisionAssist from home tool to complete city mobility solution.

Crowdsourced Hazard Alerts — Users can report obstacles, broken sidewalks, missing handrails, or construction hazards with voice or button press. Reports sync to all nearby users in real-time. The AI learns hazardous patterns and proactively warns users approaching known danger zones. This builds the world's first crowd-sourced accessibility map.

Voice-Driven Transactions — We are building conversational commerce features. Users will order groceries, book cabs, pay bills, and transfer money using only voice with biometric authentication. No sighted help. No dependence. True economic independence.

The Real Impact

A tester who uses the app daily told me something I'll never forget. He said, "I don't feel like I need someone with me anymore. I can just walk into a restaurant, ask what's on the menu, and know what I'm eating. That's not convenience. That's freedom."

We don't want blind people to beg. We want them to lead.

Built With

Python 3.10, Flask, OpenCV, YOLOv8, DeepFace, SIFT, Tesseract OCR, Groq Llama 4 Scout API, Google Gemini 1.5 Flash API, Twilio API, SMTP, n8n Webhooks, OpenStreetMap Nominatim API, Leaflet.js, Web Speech API, Arduino Uno, HC-SR04 ultrasonic sensor, PySerial, HTML5, CSS3, JavaScript.

Built With

arduino-uno-with-hc-sr04-ultrasonic-sensor
deepface
flask
google-gemini-1.5-flash-api
groq-llama-4-scout-api
html5/css3
javascript-es6
leaflet.js
local-file-system-storage
n8n-webhooks
numpy
opencv
openstreetmap-nominatim-api
pillow
pyserial
python
python-3.10
sift
smtp-(gmail)
tesseract-ocr
twilio
web-speech-api
yolo
yolov8

Updates

Private user posted an update — May 06, 2026 02:59 PM EDT

Smart Phone is an exception to reduce cost of hardware. 1)camera for better image for ai vision recognition

I consider Smart phone is easily available for all blinds so to scale and reduce cost it's a better option instead of using hardware parts like expensive camera it's better to use phone camera .

Log in or sign up for Devpost to join the conversation.

Private user started this project — May 03, 2026 03:50 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.