Make sure to use headphones while running the Demo. Find the link to the app and code in the drive link. Also, the video has been recorded for reference. The audio in the video might be poor as the phones do not support internal recording.
According to WHO there are about 39 Million blind people throughout the world. Being visually challenged there are a lot of basic daily tasks that are really difficult for them. Navigating their surroundings is one of them. With the advancement in Computer Vision and VR technologies, modern tools could be designed which could leverage machine learning and cloud computing to provide assistance to the visually impaired which would help them navigate their surroundings.
What it does
The idea behind the application is to use the relative position of the object with respect to the subject and encode the same information in the audio signal to notify the user about the object. This can be done by leveraging Computer Vision models trained on diverse data sets. Here we have used a pre-trained YOLO v2 model which has been trained on 100 classes and used the same to produce the stereo audio using the android support for Google VR Technologies.
How I built it
The model has been already pre-trained for Object Detection on the diverse ImageNet Dataset. This allows for a wide variety of classes. The model right now runs completely on edge device and does not require and internet access. The computer vision model takes one image at a time and returns the list of objects in the frame and then the user is notified of that particular object.
Challenges I ran into
Configuring the model and parametrizing the VR audio models were the two sensitive parts of the project. Right now, I am in the process of adding multiple more models like OCR, tracking mode, and emergency button. This would mean the app could use the google assistant to understand what particular task needs to be done and perform that. For instance, we can ask the assistant to start track mode and then the model will lock the person in front of it and provide directions to the blind person to follow that person.
Accomplishments that I'm proud of
Getting an end to end ML model to run on the Android device in a day is a challenging job. We ran into a lot of issues while configuring ML model to run smoothly on the Android device which basically cleared a lot of concepts.
What I learned
We have very low experience with Android application development. And this particular project helped us understand how Android programming is quite extensive and different considerations that need to be taken while deploying an ML model on the Android platform.
What's next for 3-D Vision Assist
We would love to complete the application and add complete Google Assistant support along with Google Firebase Cloud Support which could run more extensive models. We also plan to add SOS service which would utilize the Twilio API to alert the emergency contact with an SMS and the video stream link for the particular person. This would help the person get remote support. Other specifications are text reading support which would segment text boxes and read them out loud.