Inspiration

We came across a collection of media indicating how blind people need assistance in daily chores such as segregating their groceries and inspecting the ingredient list on packaged food etc. The current method used to solve this issue is by having volunteers hop on video calls to assist them. This got us thinking of ways to make an open-platform service with the use of computer vision and speech-generative tools to better address issues faced by blind people.

What it does

The code reads video input from a mobile device which is then preprocessed on the backend server for object detection. We make use of bounding box coordinates to determine its placement and prompt the user to move the camera/object based on their locations to centralize the target. The target image is then captured and processed for text which is read out loud by a text-to-speech engine.

How we built it

We made use of the state-of-the-art YOLOv8 model. This was an essential component of our computer vision predictions. The model can be trained for different datasets by changing the weights passed allowing for a level of customization based on what the user prefers. We made use of iCloud connectivity for the scope of this project to link the devices together. The outputs were passed into a text detector followed by a text-to-speech engine to read out what the model sees.

Challenges we ran into

We ran into cloud connectivity issues. Azure proved to be hard to work with due to its poor documentation and interface. As of now, we make use of pre-defined iCloud systems to connect our front and back-end systems but intend to switch over to HTML connectivity for future implementations.

Accomplishments that we're proud of

Working with object centring and tracking was challenging but we laid out the logic to make the program print the appropriate instructions based on where the object was detected on the screen. When working with the website, we initially had issues making the webcam activate locally but made it work by tinkering with the HTML file.

What we learned

We learned a large amount of backend frameworks as we were trying to determine which one worked best for our needs. Connectivity was another topic that we thoroughly explored, along with the limits of local system computation.

What's next for Accessible Vision

Up next, we hope to implement a smoother web interface along with efficient file handling.

Share this project:

Updates