VoiceVision: Object Recognition for the Blind

Inspiration:

There are an estimated 4.95 million people blind, 35 million people visually impaired and 0.24 million blind children in India. Cataract and refractive error remain the leading causes of blindness and visual impairment, in India.

Thus, the creation of VoiceVision was motivated by the aspiration to make use of cutting-edge technologies in order to empower visually impaired individuals by giving them more freedom and access to their surroundings.

What it does?

VoiceVision uses Computer Vision and Deep Learning to recognize things in real time from a live image. Following that, it converts these visual signals into spoken descriptions, which let individuals to hear and comprehend what is occurring in their immediate environment. Workflow:

Image Processing: Utilizes OpenCV for efficient image manipulation and feature extraction.
Object Detection: Employs the YOLO algorithm for accurate identification of objects in image.
Text-to-Speech Integration: Converts object descriptions into natural-sounding audio using the gTTS library.

How we built it?

We built VoiceVision using OpenCV for image processing, YOLO for object detection and integrated the gTTS library for converting text descriptions into audio. Yolo is a faster object detection algorithm in computer vision and first described by Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi. The model architecture is called a DarkNet and was originally loosely based on the VGG-16 model. The entire code was developed in Python, through Jupyter Notebook. And, it was rendered through Flask for the users.

What's next for VoiceVision?

In the future, we aim to expand VoiceVision's capabilities to recognize and describe a wider range of objects and scenes through live camera. We plan to enhance its usability by incorporating user customization options and improving its compatibility with different devices and environments.