FocusForward

Inspiration

September 2024, Apple announced that their Airpods($200) would now offer hearing-aid capabilities, making these services much more affordable than actual hearing-aids($4000-$8000). This got me thinking of other existing products that could help people. The Meta-Rayban glasses are an amazing product that has inbuilt microphones, camera, touch controls, speakers, and Bluetooth connectivity. The goal now was to create a compelling feature rich app that can help a visually impaired person navigate our world better.

What it does

At its current state, it is a webapp that showcases some use cases. They have access to four features, object identification, text to speech, facial recognition, and proximity warnings. There are two live video feeds. One is a live depth map. The other is a live feed that will detect faces with a red box and label them on screen. Top left is text that tries to identify whatever is centered on the webcam. Pressing 'r' will trigger the text to speech and the program will read whatever text it detects. Pressing 'f' makes the program read the name of detect people. Saying 'Identify' will make the program say what object it identifies. Finally, any object within one foot of the webcam will make the program beep.

How we built it

We broke down the project into four main parts: Depth Sense, Image Classification, Text to Speech, and Facial Recognition.

Face Recognition was the easiest as there was a library that allowed for real-time face detection and recognition. You can upload anyone's photo in the file named "faces" and use the template we created in the code to add the person and then the program will identify faces using the webcam using opencv tools.
Image Classification was just as easy, as there was already a pretrained model posted by OpenAi. We used their model with a carefully curated list of relevant objects we thought would showcase the capabilities of the model well. This feature lets the user to navigate the world with confidence.
Depth Sensing was achieved through a model that the community contributed towards (see credits.) Depth-Anything-V2 is a monocular depth estimation model. It is both lightweight and fast making it a great model for us to use. This feature is important as it helps to identify objects that are getting too close to the wearers face. Whether it be a wall or running into a pole, it can help notify the person that something is right Infront of them.
Text to Speech Arguably one of the most important features proved to be the hardest to get right. Just as we were about give up on this feature, we discovered one of Azure's services to be a highly lightweight and accurate model. It took a lot of work to learn how to work with APIs but in the end, we got a feature that works great!

Challenges we ran into

Like mentioned above, Text to Speech, was incredibly challenging to get right. We tried many options, pyocr with pytesseract, paddleocr, GTTS, easyocr, and so much more. These models all struggled to have any kind of consistency. Debugging the problem, we discovered that all these models struggled with identify where and what the text is. It's post processed frames (black and white and denoised), was hard for even a human to make out what was written. The model we ended up using still isn't perfect but as a MVP and as a proof of concept, we believe this feature works and has lots of room for improvement.

Accomplishments that we're proud of

Our biggest accomplishment is taking an idea we had and trying to achieve it to the best of our abilities. We learned so many tools and techniques that we are sure to use on future projects. With the help of AI, we were able to break down problems and quickly understand how to perform actions. It was an amazing journey from start to finish, through all the headaches and tears, we are proud to have made a webapp that "functions" and has the ability to help a lot of people.

What we learned

We learned so much from planning, coding, debugging, and deploying a project. This project forced us to learn and implement industry best standards to keep our project organized and manageable. We learned a lot from the models we used and using new frameworks like Flask. But reflecting as a group, we agreed that it was the small things that we picked up. It made us glad to have gone through the very painful but rewarding process of completing a project. Just to list a couple, from trial and massive error, we understood the benefits of a virtual environment and have used it countless times since then. Also having multiple working and making many changes, it also got us familiar with version control; using services like git and github.

What's next for FocusForward

Optimize the code and to publish a working app on Meta's store. It is a long journey ahead, but we believe in this idea and would like to see how far we can take it!

Built With

azure
cmake
depth-anything-v2
face-recognition
flask
html5
jupyter
openai-clip
opencv
pyocr
pytesseract
python
pytorch

Updates

Yeshwanth Dhanasekar started this project — Mar 28, 2025 04:56 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.