VisionMate

Mobile Application User Interface
prototype (we ended up didnt fully follow this but this is our start!)

Inspiration

We came together as a team over our shared passion to create high-impact products and solutions to contribute to the healthcare sector. A member shared how she came across visually-impaired influencers on social media detailing the hardships they faced on a daily basis—just how difficult it was to complete simple tasks such as navigating around an obstacle. Watching her suggested videos, we echoed her empathy and decided that we wanted to build a product specifically targeted at improving ease of mobility for the visually impaired. Hopefully, starting with a smartphone app, we can make walking outside a safer, more practical task.

What it does

Walking around within a bustling surrounding is typically dangerous for a person that is visually impaired, sometimes even for those that are not. But with this innovative mobile application, users can take photos of their surroundings with their phone, which are then processed with image recognition and depth estimation algorithms to identify obstacles and provide personalized and quantified safety suggestions from LLMs for blind users to travel within their surrounding environment.

How we built it

This fullstack project is built through a seamless integration of a robust backend and innovative frontend, with hidden features that enhance the user experience. For the backend, we implemented FastAPI for its speed and flexibility. After processing the image taken by the user, through OpenAI API calls and carefully engineered prompting using techniques such as Chain-of-Thought, we verbalize surrounding of the visually impaired. When an obstacle is detected, we use the Segment Anything Model (SAM) and Depth Anything V2 to estimate its distance from the user. Since no highly accurate measuring AI model exists, we combined Depth Anything V2 for depth mapping with SAM for object segmentation, creating a precise obstacle mask to improve distance estimation. On the frontend, we utilized FlutterFlow, which allowed us to send surrounding photos to the backend through Firebase. Additionally, we integrated special features, including Eleven Labs for text-to-speech, the Whisper API for speech-to-text, LangChain for advanced language processing, and depth estimation frameworks to provide an extra degree of surrounding information for those in need.

Challenges we ran into

Some of the biggest challenges we ran into were our indecisiveness in mapping out the project benchmarks, individual technical challenges which prompted us to reach out to mentors and other hackers, and arguably the largest of them all, scrambling to beat the deadline through all our attempts to perfect our project.

Accomplishments that we're proud of

Having two new hackers on the team, along with our decision to explore a variety of new apps and methods we had never encountered before, has been our biggest challenge. However, we’re all really proud of how well we collaborated and pushed our limits. We’re also especially proud that our work resulted in an app that contributes to accessibility and addresses important social issues, making a meaningful impact beyond just the technical side.

What we learned

Through this process, we learned the value of time and it being the most valuable resource especially in hackathon situations. This involved improving methods of time management, finding new ways to brainstorm ideas, and having a high sense of adaptability. Additionally, we learned to familiarize ourselves with novel technologies and use them in ways to solve problems which meant a lot to us.

What's next for Navigating the Unknown

The next couple challenges that we want to tackle with the app are fine-tuning the LLM image recognition model to evoke more precise responses that can be better quantified to provide the user with pin-point information, and implementing a translation text to speech feature to further simplify the user's surroundings in a foreign environment. There are several directions with which this project can be advanced, but we feel that the combination of the wide variety of technologies that we used exhibited the power of fusing cutting-edge resources together to create impactful developments to the field of healthcare.