Inspired by how much machine learning is underused in modern applications, and how it could play a significant role in aiding the live of the visually impaired when mixed with other technologies.
What it does
The user just point the camera in a general direction and taps to take a picture. It then scans the image, and sends it to a server, which uses tesseract (An OCR library using machine learning and pre-trained examples) in Python to convert the image to text. If no image is found, then the app keeps taking images. If it receives text from the server, it uses a Text-To-Audio library to convert make it audible for the user.
How we built it
Using Android Studio with a flutter plugin, as well as a flask server running a python script, which runs a tesseract OCR library. Google's TTS is used for the Text-To-Sound conversion.
Challenges we ran into
-Initially with Python using a Kivy framework. However, due to the lack of support for iPhone, poor documentation and various lack of supports, development was switched to Android studio using Flutter. This allowed for greater flexibility, and guaranteed a better documentation. -Time was a huge factor, initially it was planned for the app to have a connection to the Google Maps API and be able to give the user directions, but it would take too long. -We attempted to train the machine learning algorithm, but it's a huge task, which took hours just to learn 3 fonts. -Further, there was a plan to make the app take photos periodically and talk if any text was detected. This was done in Python but Flutter does not support this behaviour.
Accomplishments that we're proud of
The server provides backend support for the app, and it took a lot of time to setup. It can be expanded to provided additional functionality in the future.
What we've learned
Machine learning has a lot more support than initially thought. It is extremely easy for any individual to set it up with your own training data is extremely versatile; it can be used for a wide variety of purposes.
What's next for App for the Visually Impaired
It can still be optimised. The auto-scan function can be implemented to reduce the inconvenience for the user. More test cases can be used inside the app, so it doesn't rely on an external server (which has more test cases), and thus can be used without an Internet connection. The Google Maps API can be incorporated to the app so that it works in conjunction with the camera, to tell the user what shop they are facing, and help them find shops.
PLEASE NOTE - This program requires a server. The server is public and is currently running. The code for the server has been attached so it can be examined, but no download of it is required.