SOUR | Devpost

Inspiration

Our inspiration was our good friend Sourish.

What it does

It's a glasses addon that takes video and audio input to answer a blind person's questions about the surrounding area. It also has the capability to save and identify people's faces.

How we built it

We built it using a combination of embedded engineering and software. We utilized a esp32s3 sense to stream video input to a colab notebook that ran a vision LLM and Gradio to describe the environment. We used Intersystems IRIS Vector Search to save and identify people.

Challenges we ran into

We planned to use the intelai pcs to train and run the vision LLM model since we didn't have enough VRAM to run it locally. Unfortunately, we needed to also use them as servers and were concerned that we would not be allowed to use them as such. So, we pivoted to google Colab. However, this led to many communication issues regarding communication between public and private IPs. Fortunately, we were able to resolve them. Finally, we also had planned to use the microphone on the microcontroller. However, we discovered that using the camera onboard generated too much heat, so using the microphone could lead to even more problems.

Accomplishments that we're proud of

We are very proud of the end product. Being able to hear it describe the room without our glasses is incredible.

What we learned

We learned a lot about the limitations from hardware and software, as well as how to use various APIs to workaround communication issues.

What's next for SOUR

Our next goal is to make the product more viable, and have it respond with voice recognition. Furthermore, we also want to add face detection to save family and friends for identification.

Built With

arduino
c/c++
flash
gradio
grok
intersystems
moondream
python

Submitted to

HackTX 2024
- Winner Best use of GenAI using InterSystems IRIS Vector Search

Created by

I developed the Gradio user interface and used Torch to run inference on Vision LLM and Whisper AI models for transcribing audio to text and visual question answering tasks. Additionally, I created the "Commit to Memory" section, where I handle GET and POST requests to a database developed by Caleb.

thang truong
I performed research on the InterSystems Database API to build a backend memory server. This server is able to save pictures of faces to reference with new pictures. The server is then able to use the Iris Vector search to find the most likely candidate for who that face belongs to.

Caleb Devon
Vivek Keval
Anik Patel

Updates

Vivek Keval started this project — Nov 03, 2024 06:01 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.