Inspiration
Our world currently has around 140 million population who are visually impaired out of which 40 million are totally blind. With the rapid growth of technology, advancements have been made in several fields of human interaction. However, there have not been many contributions towards the welfare of the visually impaired. This project seeks to aid the difficulties faced by the blind in perceiving and understanding their surroundings by harnessing the power of AI and revolutionary cloud technology especially in tougher situations such as crossing a road or reading a book. The proposed solution solves for the purpose by providing complete visual assistance with the help of a device that is readily accessible by them (their mobile phones). The solution proposes to build a multi-lingual mobile application through which the visually impaired clicks a picture of what he needs to perceive, be it the environment around him or a piece of text which he needs to read without having a tactile writing system like braille. The native mobile application takes advantage of the Azure Cloud APIs and App Services to harness the power of the revolutionary cloud and AI technology to solve for the purpose.
What it does
The solution seeks to aid for the difficulties faced by the blind in perceiving and understanding their surroundings especially in situations such as crossing/walking through a road or reading a book. The solution’s main goal is to provide a sense of vision to the visually impaired. This would be of immense importance to them, especially for those who wish to keep pace with the ever-growing technology. The application's name has been kept as “Vision” as an indication of the cause it finds to solve. With the help of vision, the targeted audience (visually impaired) would be able to perceive their surroundings, all by themselves which previously they were not able to do. It would allow them to comprehend text, without even actually having to use braille or any other tactile writing system. Location assistance also allows the visually impaired to be aware of their current location, all by themselves. The application also supports multi-lingual voice narration, which adds on to the accessibility of the application. It achieves the above-mentioned functionalities, with the help of Microsoft Azure – Computer Vision, Custom Vision, Azure Translator, Azure Maps, and Azure Speech SDK. Finally, the application will be able to describe the image clicked by the user, analyze the image in-depth for features such as face and for custom scenarios and recite it to the user. It will also be able to identify text with help of the OCR feature and recite the same and would be able to identify the user’s location with respect to a landmark. Providing the application in the user’s native language with an option to change languages would make the application more accessible and usable to the commoners.
How we built it
The proposed solution uses a wide range of technologies to converge at the solution. The system uses Image Captioning, Optical Character Recognition, Text to Speech, Machine Language Translation provided by Microsoft Azure Cloud Services and Custom trained deep learning models. Image Captioning enables the conversion of image to relatable text, which in turn is converted into voice with the help of Text to Speech, thereby making blind to comprehend the visual surrounding, which in turn gives a sense of Vision. Optical Character Recognition enables the blind to understand the text without the help of a tactile writing system like braille. Machine language translation allows the conversion of text to any language seamlessly. The modules are integrated together as a single working mobile application powered by java-script (React native). The final solution would be a working mobile application that would be voice-enabled, to make it easier for the blind, and here are the things it would be able to accomplish • Able to click pictures with a single touch/tap • Describe the clicked picture in a comprehendible language • Analyse text in the image and recite it to the user and ability to summarize long texts • Multiple language support (both regional and world-wide) currently supports 69 languages with native fluency in 30 languages • Ability to change languages with voice commands • In-depth image analysis for specific use cases like finding currency notes, walking on the road, etc • Get location assistance based on their current GPS position with respect to their nearest landmark
Accomplishments that we're proud of
“Vision” was initially started with a small survey regarding the hardships faced by the blind students in coping up with the pace, eventually landing at a solution, to help them overcome their difficulties faced as their routine. With each milestone in the development phase, “Vision” was rigorously tested with reference to the visually impaired, gathering all the inputs and feedbacks from those, for whom Vision was developed. Further to optimize the solution to model the use-case effectively, Vision was iteratively tested on the visually impaired from a Blind school in Chennai. The solution would prove to be a sense of vision, which previously they were deprived of. Though initially targeted for school students across the country, Vision now has the potential to serve as a visual assistant to all the visually impaired across the world. Coming on the cost aspect of the solution than comparing to the previously existing solutions, Vision would be of very low monthly cost owing to its backend architecture and efficient usage of available technology. Thus, Vision would be able to provide all the features at a very low cost!
What's next for Vision - the Visual Assistant for the Blind
- The application can be made personalized to the user to work according to the user’s context to give user-customized image description.
- Use of advanced geo-spatial coordinate indices to give location assistance to the user based on their needs.
- Use of 3-d image mapping to completely assist the user when they are present outdoor or in harsh terrains.
Log in or sign up for Devpost to join the conversation.