Non Profit organization: Vision Aid (Wildcard non-profit)
This organization aims to provide assistance to the visually impaired.
An estimated 285 million people worldwide are visually impaired, 39 million are blind and 246 million have low vision and these numbers keep on increasing with time. You might wonder how these people deal with everyday challenges.
A blind or a low-vision user may need help with anything from checking expiry dates, reading instructions, or navigating new surroundings. These simple things can become quite difficult to deal with and so we need a solution.
And this is how we came up with our idea “A-Eye” which helps to provide visual assistance to these people to make their lives a little easier at every step.
What it does
A-Eye is an AI-powered application that aims to provide visual assistance through its features:
- One of the challenges that a low-vision person might have to deal with is transaction fraud. They might not be able to distinguish between something as simple as a bill of 10 bucks or 20 and fall victim to transaction fraud. So, we decided to add a feature that targets this very problem. Currency recognizer - with just a voice command, this feature instantly recognizes currency and speaks the denomination, enabling people experiencing visual impairments or blindness to quickly and easily identify Bills.
- Every day we come across a lot of textual data that we need to read, be it an e-mail, a newspaper article, a legal document, or simply an SMS. For people with bad vision reading documents like such can be difficult and this inspired us to add the next feature. Document Reader - convert printed text into high-quality speech to provide accurate, fast, and efficient access to long, difficult-to-read documents.
- The hardest challenge for the visually impaired is to navigate around especially when they are in an unfamiliar environment. So we came to the rescue by providing a feature that uses state-of-the-art image captioning technology. Speak to me - describe surroundings using properly formed English sentences spoken out loud.
Our android application has been designed keeping in mind our target user base. All the features are accessible through simple voice commands. To make the application more user-friendly, we have added vibrations and audio cues to help the user interact with the application in a better way.
How we built it
A-Eye is coded in Java, with automation and deep learning being at the heart of its code.
Our project consists of three major components namely the AI part, server part, and the application part.
The AI part basically consists of 3 models each targeting a different problem. First, we built and trained a model for the image captioning component while working simultaneously on our other models for currency recognition and document reader. The image captioning model was trained on Flikr8k dataset (The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations). For currency recognition, Indian currency dataset was trained on the Xception network.
We used Flask to create our very own custom APIs to access our AI models and hosted them on a server using ngrok.
The application development was done using Android Studio and was designed using Figma. We used Retrofit to make API calls to the server and get respective results.
Challenges we ran into
Thinking about designing for the visually impaired was a challenging task in itself, but by following accessibility best practices, we tried to make progress toward designing an accessible mobile app experience. We discussed a lot about how we can make our app more user-friendly and not just a working application. After thinking a lot we decided to add certain audio cues or sounds or vibrations to help the user know which activity is being performed on our application. We created the application in a way that it can be easily controlled using voice commands. And so, we tried to achieve something that can allow our user base to truly benefit from the application.
We also faced issues while hosting our API on platforms like Heroku because of the bulky size of the Deep Learning models. To tackle this issue, we used ngrok which temporarily hosts our local server on the web.
We also faced problems while making API calls. The methods we used were traditional and were not compatible, so we switched to using Retrofit for the same.
Another challenge we faced was during our AI model testing when we found that the document reader could not provide satisfactory results. We tried finding different python OCR modules to get better results but were unable to do so. They were all taking too much time and again the results were not that good. In the end, we used firebase ML-kit’s document reader which gave very good results in a short time and was well suited to our application.
The code initially was in a lot of bits and pieces since everyone was building individual features. Combining everything to work together and run as a single unified application took a lot of patience and teamwork. However, each and every member worked tirelessly to finally complete the project.
Accomplishments that we're proud of
We had never done such a big development project. To achieve so much in our first dev project is something that made us immensely proud. After hours of discussion, surfing through the web and building all the different components, it was all smiles when the entire thing worked without a hitch.
After facing so many challenges, we finally were able to build a working prototype. The API calls gave us a huge headache but when we were able to successfully get results from our API, it gave us the same amount of relief.
Apart from the technical accomplishments, we are immensely proud of how well our team worked and performed together, throughout the entire duration. Everyone helped each other solving doubts and clearing concepts in their part of the code.
What we learned
This was the first project where we made an application integrated with deep learning. We came across a lot of new deep learning techniques. This project gave us a conceptual understanding of Android development. We certainly got to know a lot about different android application dependencies, some of which we never even heard the name of like Retrofit, Firebase’s ML-kit, etc.
Integrating this project was one of the biggest challenges we faced and so we learned a lot from it. The API calls gave us a huge headache but when we were able to successfully get results from our API, it gave us the same amount of relief.
After making this project, we build some confidence in our own ability to lead, listen and be able to make the right decision. And of course, the most important thing we learned was about our target user base. We all know there are people with such disabilities but when we worked on this very project, we actually got to know a lot about things from their perspective.
What's next for A-Eye
Our application turned out to be pretty good considering the fact that it was built in such a short period but this also means that there is a big scope of improvement in it.
- Right from the beginning, we can improve the AI models to be much more accurate and conclusive.
- Additions can be made to the feature of Currency Recognizer where it can not only recognize the denomination of the bill in front but also detect multiple bills of any denomination and give the total amount in front.
- For the feature Speak to me, we can train its model on a much bigger dataset. Generation of speech describing the surroundings can be done in real-time on the video that is being captured from the camera of the device.
- We can host our API on much more capable servers to make our application respond faster.
- Ultimately, our goal is to make this service accessible on many different platforms, not just your phone.
So, stay tuned for A-Eye Pro :)