Auxiliary Vision

Inspiration

Based on the hackathon’s themes and theme examples, one of our members wanted to build a project to help visually impaired people. He wanted to build something that scans and reads text from a camera. From this, another member was reminded of a drama about an app that uses object recognition for the visually impaired. Based on all these ideas, to make our project more original, we decided to make an app with multiple functionalities that visually impaired people can use for their education and daily lives.

What it does

The project was aimed to have supportive accessibility functions and educational functions. Text and object recognition are the baseline functions. An additional feature includes reading aloud an uploaded PDF document, which would be useful for reading educational documents from other sources and other documents in general.

The app was intended to have more than 3 functionalities. However, currently, it has only 3: text recognition from images, object recognition from images, and document reading from file uploads.

How we built it

To start with the backend, we used Python and Google Cloud Vision API to implement classes of functions which would return objects that could localize objects, and detect text. The functions take in a jpeg file, that is stored in a specific media folder for access between both client and server. The output a .json file of either the objects, or text, which is read by the TTS class, that creates an .mp3 file and saves it under media/sounds.

Challenges we ran into

The major challenges we ran into were with creating a React Native App and linking the front-end with the back-end using an API like Fast API/Flask. At this moment, we haven’t gotten to the latter part due to time constraints and our current level of experience creating APIs. Though, we aim to have this portion completed independently from the hackathon in a later time.

Accomplishments that we're proud of

We are proud of being able to connect all the external APIs and get them working cohesively in the back-end, as well as being able to create a React Native app within 36 hours. Most importantly, we are proud of how much we learned in this experience.

What we learned

We learned how to authenticate and connect external APIs to our project, how to create a server to connect the front-end and back-end, and how to create different functions in our React Native app.

What's next for Auxiliary Vision

The project’s next major steps include connecting the backend with the front end by learning more about creating APIs, and implementing voice commands. It would make the app easier to use for the visually impaired if they could use their voice.

Some other additional features would be to have a machine learning model that could automatically summarize the contents in the uploaded educational documents and a note-taking feature using voice.

Links

In the links, we have included our demo video, GitHub repository, and Google Drive containing our output files if the viewers are interested in hearing the output audio file.