VisionVoice (Image-To-Speech)

Bringing clarity to the visually impaired with VisionVoice.

Inspiration

Have you ever thought about how the visually impaired interact with the world around them? Picture this scenario: a friend sends a photo message, and the visually impaired person struggles to reply with an appropriate response or struggles to identify the objects and people in the image. This is where our Image-To-Speech solution comes in. Inspired by the desire to make the world a more accessible and comfortable place for the visually impaired community. By providing a tool that empowers the visually impaired to experience the world in a new and meaningful way, we hope to make a positive impact on the lives of those who use it.

What it does

Image to speech allows visually impaired individuals to gain a better understanding of the world around them through audio descriptions of images. It allows users to either upload an image or capture one through a webcam and receive a comprehensive audio description that includes details such as colors, designs, and other descriptive information. The audio description is generated using Google text to speech (gTTS) library, which converts written text into spoken words.

How we built it

We conducted research to develop an algorithm for converting images to captions using AI/ML. Our approach involved collecting and analyzing data, training and testing a transformer architecture model, and utilizing a neural network to further improve the model's performance. After successfully completing the image-to-text component, we implemented text-to-speech using the gTTS library and integrated webcam detection using OpenCV. Our final product was deployed on the PyCharm IDE. This solution provides a more efficient and effective way of converting images to speech.

Challenges we ran into

Despite a lack of expert knowledge in machine learning and neural networks, we decided to proceed with our idea. The learning process was both challenging and rewarding, as we delved into the vast amounts of data and gained a deeper understanding of the subject matter. we remained committed to finding solutions and conducting research. Our determination and hard work eventually led to the successful development of the image-to-speech algorithm.

Accomplishments that we're proud of

We are proud to have successfully participated in our first hackathon and overcome the challenges of working with machine learning. Our accomplishment serves as a testament to our dedication and determination to excel in the field of machine learning and artificial intelligence.

What we learned

During the project, we gained valuable knowledge and hands-on experience in the field of data analysis. Our exposure to the complex algorithms and techniques involved in image-to-speech conversion allowed us to deepen our understanding of the subject matter.

What's next for VisionVoice (Image-To-Speech)

Our next vision for VisionVoice is to develop an interactive platform that enables communication for blind individuals. The platform will utilize our image-to-speech to convert images received by the sender into speech, enabling visually impaired persons to interact with others without any limitations. Our aim is to create a wearable device, such as a stick or glasses equipped with a camera, that can detect and describe the surroundings to assist blind individuals in navigating and crossing roads safely. Through this solution, we hope to make a positive impact on society and improve the lives of blind individuals.