Categories
DEI AI/ML Hardware First time hackers
Inspiration
We were inspired to create Visionaria after witnessing the way Toph Beifong was able to see by feeling vibrations even though she was blind.
What it does
Visionaria uses OpenAI's GPT-4o model to analyze the view in front of the user and describe it back to them even allowing for a special request, i.e., asking it where a specific object is.
How we built it
We built Visionaria using a Raspberry Pi, microphone, webcam, and headset. The code on the Raspberry Pi listens on the microphone for the user to say the keyword "Jarvis" then takes a picture using the webcam and analyzes it.
Challenges we ran into
We had some trouble where the model would get confused when asked certain questions about the image but we were able to fix this through prompt engineering.
Accomplishments that we're proud of
We were able to allow user input for special requests by altering our API call based on the user's speech.
What we learned
We learned a lot about how powerful OpenAI's vision models can be. We also gained a lot of experience with prompt engineering.
What's next for Visionaria
We hope to reduce the time it takes to process the images and reduce the total size and amount of wires associated with the device.
Built With
- openai
- python
- raspberry-pi
Log in or sign up for Devpost to join the conversation.