Vision_Warriors

Inspiration

I was inspired to create the Vision Warriors app because I have noticed that visually impaired individuals often do not have many resources to learn about the world around them. Though there are tools to help them read (like Braille or text-to-speech devices), these tools often have limitations. A common limitation among these tools is not being able to describe color to these impaired individuals. This app aims to not only convert text to speech, but also to give voice to the silent language of color. In this app, color has a sound and text has emotion.

What it does

This app currently has 3 main features. One of the features of this app is a text-to-speech converter. It can take and analyze text from PDFs, Word documents, .txt files, PNG and JPG files. After analyzing the text, it detects the color of the text, reads it out loud, and then reads the content of the text (I will describe the specifics in the next section). This app also allows users to click on any part of an image and hear the color out-loud (this specifically targets color-blind individuals). Additionally, this app features a webcam and allows the user to place an object in the middle of the camera and it will say the color out loud.

How we built it

Text-Reader: This application is a Tkinter-based desktop GUI designed to assist users with visual impairments by reading aloud both extracted text and its corresponding color from uploaded documents and images. It leverages OCR (Optical Character Recognition) through the OCR.Space API for parsing text from images (e.g., PNG, JPG), while PDFs and DOCX files are processed using PyMuPDF (fitz) and python-docx, respectively, to extract embedded text and font color data. For color detection in images, it uses OpenCV and KMeans clustering to identify the dominant background color and estimate the text color by computing its complementary RGB value, then maps it to a predefined set of generic color names using Euclidean distance in RGB space. Text-to-speech synthesis is performed using pyttsx3, allowing users to choose from available system voices and listen to the content and detected color via threaded speech playback. Image Color Detector: This Python application is a desktop GUI tool built with Tkinter, designed to help users—especially those with visual impairments—identify and hear the name of a color from any point on a static image. It uses PIL (Python Imaging Library) and OpenCV to load and process the image, capturing a small region around the clicked pixel and computing the average RGB color. A custom color-matching system based on Euclidean distance compares this average to a dictionary of generic color names, approximating the closest match. The identified color is then read aloud using pyttsx3, a text-to-speech library, providing instant auditory feedback on visual content selected by the user Webcam Color Detector: This Python program is a real-time webcam-based color detection application designed with Tkinter for the GUI and OpenCV for capturing and processing live video frames. It continuously analyzes a central region of interest (ROI) in each frame to compute the average RGB color, smooths this data using a running average over recent frames, and matches the result to the nearest name in a predefined BASIC_COLORS dictionary using Euclidean distance. Detected color names are visually displayed and audibly announced using pyttsx3 for text-to-speech, ensuring accessible feedback for users with visual impairments. The app also includes controls to start and stop the webcam, dynamically updates the canvas with the video feed, and limits speech output frequency to prevent repeated announcements

Challenges we ran into

I had some difficulty in getting the image-based and webcam-based color detectors to say the right color. It kept on saying “gray” and “color unknown” for a lot of the colors because their RGB values would not match the ones I had pre-defined in the BASIC_COLORS dictionary. To solve this, I added more colors in the dictionary with unique RGB values and edited the region size to take an average of a group of pixels near the clicked area, rather than just take the color of a single pixel. This worked for the image-based color detector, but the webcam needed some more work. I eventually fixed it by smoothening the color signal using a moving average over recent frames, reducing flicker and stabilizing the output.

Accomplishments that we're proud of

I am proud of obtaining great accuracy within my models. The text-reader has about a 99% accuracy, as it is able to read text-clearly and correctly, and identify the color of the text. The image-based color detector displayed good accuracy during the test runs, correctly stating 22 out of the 25 test colors (I put 26 different RGB combinations in the predefined dictionary). The webcam also displayed fair accuracy, correctly identifying the color of 21 out of 25 test colors (I put 22 different colors in its predefined dictionary). It could detect the color of an object from up to 25 centimeters away.

What we learned

I learned how to work with audio in Python. I also learned how to interpret data from both images and webcams in Python.

What's next for Vision_Warriors

I am currently fixing the webcam issues, though they are mostly resolved. I am also currently working on updating the American Sign Language feature of the app. It currently has been trained on a very limited dataset. I want to expand this dataset and ensure full functionality.

Built With

ocr
pil
python
pyttsx3

Updates

Aarush Sinha started this project — Jul 13, 2025 11:14 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.