Inspiration
Our inspiration for Visi was that people who are visually impaired can often miss critical social cues that can make everyday interactions feel awkward or difficult. We wanted to help visually impaired individuals feel more comfortable and confident with their environment.
What it does
Our project uses an AI-powered camera to interpret the users environment and other people's social cues. Users can use our camera to recognize common hand gestures like waving and handshakes, track people's faces and their mood, and provide specific descriptions to the current camera view to give context about their surrounding environment.
How we built it
We built the core backend with Python, utilizing OpenCV to stream and process frames from a camera. To detect hands, faces, and gaze landmarks, we integrated MediaPipe, writing custom router logic for specific gestures like waves or handshakes. For a realistic and immersive auditory feedback, we used ElevenLabs to play appropriate text to transfer information to the user without needing to see anything. MangoDB is also used to store face embeddings. When a new person is stored into the database, their names and a 190-dimensional vector is stored to our collection. During the live video loop, each detected face is compared against the in-memory cache using cosine similarity.
Challenges we ran into
Some of the major challenges that we ran into were trying to discern between different hand gestures with the camera. A lot of the time, hand gestures would often get mixed up like mistaking a thumbs up for a handshake because they looked very similar. To add on, we also needed to adjust the confidence of the AI system when determining gestures to make sure that random hands in the camera would not be picked up as certain gestures coded in the program.
Accomplishments that we're proud of
We are very proud of our overall final product as it fulfills what we envisioned. We were able to build a truly real-time pipeline that feels responsive and useful for a visually impaired individual to use in social settings. We are very glad to create a product that makes the lives of people who are not able to live a normal life much easier.
What we learned
Through this project, we learned a lot about real-time computer vision and many of the frustrations and struggles with building predictive software. Many times, outcomes can be unpredictable and building takes a lot of trial and error. In addition, we learned how to effectively integrate multiple tools into a single functioning application with large language models like Gemini and AI voice tools like ElevenLabs.
What's next for Visi
Our ultimate goal for Visi is to integrate this software into devices like Meta Ray-Ban glasses where the user can easily use our software to feel more confident and make their lives easier. We want to seamlessly integrate our tool as easy as possible into visually impaired individual's lives.
Log in or sign up for Devpost to join the conversation.