VocalEyes

Image of the Website
Text Recognition using Tesseract and TTS Conversion
Gemini API Listening

Inspiration

Visually impaired individuals face daily challenges in accessing printed or screen-based information. We were inspired to build VocalEyes as a way to empower them through technology—enabling real-time text detection from their surroundings and converting it into speech. The idea stemmed from a desire to bridge the accessibility gap using computer vision and AI.

What it does

VocalEyes is a web-based application that:

Captures live video from the user's camera
Uses OCR (Tesseract.js) to detect and extract text from the video feed
Reads the recognized text aloud using text-to-speech
Object detection using OpenCV and YOLOv5 technologies to alert them about any obstacles
It's designed to be intuitive, accessible, and assistive—especially for users with visual impairments.
Includes Gemini assistant - to ask any further questions and

How we built it

We used:

HTML/CSS/JavaScript for the frontend UI
Tesseract.js for client-side OCR (Optical Character Recognition)
Google Translate API for text-to-speech functionality
Gemini API for Gemini assistant
All functionality is handled in-browser, keeping it lightweight and easy-to-use

Challenges we ran into

Making Tesseract.js work with live camera feed was tricky; we initially tried processing raw image data instead of using toDataURL(), which caused errors.
Debugging the OCR recognition took time.
Working with different browsers and camera permissions inconsistencies.