Inspiration
Our inspiration was our good friend Sourish.
What it does
It's a glasses addon that takes video and audio input to answer a blind person's questions about the surrounding area. It also has the capability to save and identify people's faces.
How we built it
We built it using a combination of embedded engineering and software. We utilized a esp32s3 sense to stream video input to a colab notebook that ran a vision LLM and Gradio to describe the environment. We used Intersystems IRIS Vector Search to save and identify people.
Challenges we ran into
We planned to use the intelai pcs to train and run the vision LLM model since we didn't have enough VRAM to run it locally. Unfortunately, we needed to also use them as servers and were concerned that we would not be allowed to use them as such. So, we pivoted to google Colab. However, this led to many communication issues regarding communication between public and private IPs. Fortunately, we were able to resolve them. Finally, we also had planned to use the microphone on the microcontroller. However, we discovered that using the camera onboard generated too much heat, so using the microphone could lead to even more problems.
Accomplishments that we're proud of
We are very proud of the end product. Being able to hear it describe the room without our glasses is incredible.
What we learned
We learned a lot about the limitations from hardware and software, as well as how to use various APIs to workaround communication issues.
What's next for SOUR
Our next goal is to make the product more viable, and have it respond with voice recognition. Furthermore, we also want to add face detection to save family and friends for identification.
Log in or sign up for Devpost to join the conversation.