Inspiration
We live in a world that's increasingly visual, yet not everyone can interact with this visual information easily. Our inspiration came from the realization that visually impaired individuals often miss out on the richness of visual data. We wanted to create a tool that makes the visual world accessible and interactive for everyone, including those with visual impairments.
What it does
Viz - Your Visual Assistant allows users to capture images in real time and ask questions about those images. Whether you're curious about the objects in a photo, the colors present, or any other visual element, Viz provides real-time, accurate answers. For the visually impaired, Viz serves as an extra set of eyes, offering a richer understanding of their surroundings.
How we built it
We used a combination of machine learning and deep learning algorithms for image recognition and natural language processing for query understanding. The backend is built on Python, utilizing libraries like Pytorch and Huggingface for image analysis and text processing. We use a combination of visual models to understand images, and then transfer the information to a language model to answer queries. The input is taken through a microphone in real-time, and the output is received through a speaker, using speech-to-text and text-to-speech algorithms, respectively.
Challenges we ran into
One of the biggest challenges was ensuring that the tool was truly accessible in theory and practice. We had to consider voice commands, latency in processing images and generating text, and efficient models to run with low computing. Another challenge was optimizing the machine learning algorithms to provide accurate and real-time responses.
Accomplishments that we're proud of
Mainly, we're proud of implementing such a complex product, showing the possibility of such amazing technology becoming a part of our daily life in the future. We're particularly proud of the tool's accessibility features, which have been tested and refined manually. We're also proud of the speed and accuracy of our image analysis algorithms, which make the tool practical for real-world use.
What we learned
We learned a great deal about accessibility in software design, the intricacies of image recognition, and the challenges of natural language understanding and generation. Most importantly, we learned that technology can be a powerful tool for inclusivity.
What's next for Viz - Your Visual Assistant
We plan to expand the range of questions that Viz can answer and improve its integration with other platforms and devices. We also plan to increase the response speed of Viz through optimization.
Built With
- computer-vision
- deep-learning
- huggingface
- natural-language-processing
- operating-systems
- python
- pytorch
- transformers


Log in or sign up for Devpost to join the conversation.