Visionary

Inspiration

The inspiration behind this project is to help visually impaired perceive our surroundings in the form of sound mappings. These mappings were inspired from the idea of sensory substitution or cross modal mapping of image to sound. The app can assist the visually impaired by encoding a video stream into a sound pattern, recruiting the same visual brain areas for auditory analysis.

What it does

The workflow is presented in form of a web application which allows the user to access camera in real time and capture the streaming video. The important regions in an image/video were translated into sound waves to visually impaired, which help them analysis the 2D environment in front of them. The ROIs were realised either as a generalised scenic views for whom some metadata information was provided, or as specific regions for which only sound variations describing the object shape/design were provided.

How we built it

A local web host in form of a web application provided a interface to capture pictures and videos in real time. Extensive Machine Learning/Deep Learning models were implemented and trained on the captured data, to predict regions of interest for example : select cars, trees in a cluttered road picture. The region of interest were further subclassified as either scenic views (in a painting) or only as rois only. For scenic views we perform transfer learning, tagging the captured world and playing similar sounds to the subject. For roi based generalization, a sliding window based image to sound mapping was performed(for each subject). The result sensory maps were played back to the subject in real time.

Challenges we ran into

Understanding the science behind sensory mappings of two organs was the most difficult part. Learning several new technologies was quite challenging. We had to work on Flask, image processing from scratch. Using tensorflow and keras was also challenging.