Inspiration
According to the World Health Organization, at least 2.2 billion people have some sort of near/distance vision impairment. As such, it is hard for many of these people to read things such as small pieces of text. With an exponential increase in information found on small screens in recent years, vision impairment is rapidly becoming a major accessibility problem. At MetHacks, we wanted to create a solution to counter this ever-growing unseen epidemic.
What it does
Users can point their phone at text and press the capture button. The app uploads the image to a server to be processed by an optical character recognition program and Cohere if requested. An "advanced" mode can be toggled which gives more options for text recognition, including language recognition and keyword finder.
How we built it
The mobile app is built with React Native using Expo. Expo handles the camera system, file system, image manipulation, and text-to-speech. The backend, built with Django and the REST framework, handles the integration between the app and APIs like Cohere.
Challenges we ran into
- Working with computer vision for the first time
- Training Cohere to be accurate and have fewer hallucinations
- Manipulating the image so that it can be effectively read
- Cleaning and sorting the text
- Finding an effective text-to-speech service that can clearly read the given text
Accomplishments that we're proud of
We worked and focused well together. We divided the tasks properly such that every member had new learnings and objectives. Each member was able to efficiently complete their goal.
What we learned
This was all our first time working with computer vision and optical character recognition. We also expanded our knowledge of the tech stacks we used.
What's next for narratorRL
- Better accuracy (through OCR and Cohere)
- More accessibility features (i.e. voice commands).
- More intuitive UI/UX patterns
Log in or sign up for Devpost to join the conversation.