BlindSight

Inspiration

The seed of this project began with the premise of helping the visually impaired explore their surroundings. This also brought in the use for crime and security assessment in one's background for. Empowering individuals is our main goal and Blindsight can work alongside those who are visually impaired to enable them to assimilate to new settings with ease, but more importantly to help them take part in experiencing the visuals that accompany beautiful surroundings. The pursuit of being able to describe the world around us is full of approximations, assertions, and a respect for specificity, but nevertheless a worthwhile cause in the eyes of those who aren't able to see to their full potential.

Analyzes surroundings through a camera and generate speech describing the surroundings.

We used Swift to get access to the Camera, and feed in images every "n" seconds to our Vision API. We used Almofire to use our REST API. Once a caption was generated, we used a Text-to-Speech API to read the description in a user defined language.

Getting the camera to take images every "n" seconds was one of the more challenging parts since Swift did not have a predefined function to do that. The backed was challenging too since we were given labels; however, a lot of the vision API's did not provide us with a description of the image.

Integrating all the APIs from different providers and making all of them work with each other.
User Interface to make it more friendly for the Visually impaired.
Making the response time reasonable to prevent lags.

Scraping internet for images with captions to train a stronger model.
Adding Depth Perception to warn for safety and security of user.
Enable Data collection with provided labels and uploaded to FireBase to enable data-collection for model improvement.
Improve Language support ( currently supports 7 languages as mentioned on https://cloud.google.com/text-to-speech/docs/voices)

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.