Inspiration

When we first learned about the different tracks at this hackathon, the accessibility challenge caught our eye. When brainstorming, we realized we all have glasses and have trouble seeing without them. So we began to consider people who may have more severe impairments that prevent them from perceiving the world around them. This inspired us to make Picture-This, an application that live transcribes a scene through video feed, helping people to identify objects and their surroundings.

What it does

PictureThis is a voice AI powered web app for visually. impaired users. The user simply says "scan" as input. The app captures what's in the camera frame, sends to an AI model (ElevenLabs, Gemini) and reads out a description of whats it sees.

How we built it

TechStack:

  • Kiro for rapid prototyping and base features
  • API keys: ElevenLabs for speech generation and Gemini for image recognition
  • BackendServer: Node.js
  • FrontEnd: HTML, CSS , JavaScript

Challenges we ran into

  • We wanted to have this run on a raspberry pi, and potentially have this as an independently operating device. However, we ran into two issues: the OS corrupted and we did not have a microphone for voice recognition, which limited the accessibility feature.
  • At first, we used OpenAI for vision recognition, but unfortunately we ran out resources for the AI to continue operating for the project. So, we switched to Gemini for the remainder of the project and found that it was much more descriptive. In fact, it was able to recognize that a plant was artificial instead of naturally grown (which scared us a little bit)!

Accomplishments that we're proud of

We are proud of being able to learn how to use API keys for artificial intelligence features. In addition, we also learned that it is not wise to push the AI keys to the repository (whoops) and provided an example environment for others to put in their personal API keys. This is our first time using any sort of artificial intelligence in a full stack application with APIs such as ElevenLabs and Google Gemini. By being able to use two artificial intelligence tools at once, we already feel like winners in our hearts!! :D

What we learned

We learned how to host full stack applications on a web browser and responsibly use API keys for our projects. Additionally, we learned how to use agentic AI features such as Kiro for debugging and rapid prototyping. We have also learned about important accessibility features for these projects, such as screen reader compatibility full keyboard navigation controls.

What's next for PictureThis

We believe that Picture-This could run on standalone device. It could have its own dedicated mechanical buttons for its features, and dedicate the entire screen to the camera feed. We could also extend Picture-This to support people with a variety of visual impairments, such as colorblindness, with different filters and enhancements. For attractions such as museums and interactive exhibitions, this could be provided as a valuable tool to increase the accessibility of these experiences: no one should be excluded from participating in these events because of their preconceived circumstances surrounding disability.

Built With

Share this project:

Updates