Categories

DEI AI/ML Hardware First time hackers

Inspiration

We were inspired to create Visionaria after witnessing the way Toph Beifong was able to see by feeling vibrations even though she was blind.

What it does

Visionaria uses OpenAI's GPT-4o model to analyze the view in front of the user and describe it back to them even allowing for a special request, i.e., asking it where a specific object is.

How we built it

We built Visionaria using a Raspberry Pi, microphone, webcam, and headset. The code on the Raspberry Pi listens on the microphone for the user to say the keyword "Jarvis" then takes a picture using the webcam and analyzes it.

Challenges we ran into

We had some trouble where the model would get confused when asked certain questions about the image but we were able to fix this through prompt engineering.

Accomplishments that we're proud of

We were able to allow user input for special requests by altering our API call based on the user's speech.

What we learned

We learned a lot about how powerful OpenAI's vision models can be. We also gained a lot of experience with prompt engineering.

What's next for Visionaria

We hope to reduce the time it takes to process the images and reduce the total size and amount of wires associated with the device.

Built With

Share this project:

Updates