Viz - Your Visual Assistant

Inspiration

We live in a world that's increasingly visual, yet not everyone can interact with this visual information easily. Our inspiration came from the realization that visually impaired individuals often miss out on the richness of visual data. We wanted to create a tool that makes the visual world accessible and interactive for everyone, including those with visual impairments.

What it does

Viz - Your Visual Assistant allows users to capture images in real time and ask questions about those images. Whether you're curious about the objects in a photo, the colors present, or any other visual element, Viz provides real-time, accurate answers. For the visually impaired, Viz serves as an extra set of eyes, offering a richer understanding of their surroundings.

How we built it

We used a combination of machine learning and deep learning algorithms for image recognition and natural language processing for query understanding. The backend is built on Python, utilizing libraries like Pytorch and Huggingface for image analysis and text processing. We use a combination of visual models to understand images, and then transfer the information to a language model to answer queries. The input is taken through a microphone in real-time, and the output is received through a speaker, using speech-to-text and text-to-speech algorithms, respectively.

Challenges we ran into

One of the biggest challenges was ensuring that the tool was truly accessible in theory and practice. We had to consider voice commands, latency in processing images and generating text, and efficient models to run with low computing. Another challenge was optimizing the machine learning algorithms to provide accurate and real-time responses.

Accomplishments that we're proud of

Mainly, we're proud of implementing such a complex product, showing the possibility of such amazing technology becoming a part of our daily life in the future. We're particularly proud of the tool's accessibility features, which have been tested and refined manually. We're also proud of the speed and accuracy of our image analysis algorithms, which make the tool practical for real-world use.

What we learned

We learned a great deal about accessibility in software design, the intricacies of image recognition, and the challenges of natural language understanding and generation. Most importantly, we learned that technology can be a powerful tool for inclusivity.

What's next for Viz - Your Visual Assistant

We plan to expand the range of questions that Viz can answer and improve its integration with other platforms and devices. We also plan to increase the response speed of Viz through optimization.

Built With

computer-vision
deep-learning
huggingface
natural-language-processing
operating-systems
python
pytorch
transformers

Submitted to

hackUMBC Fall 2023

Created by

I worked on the machine learning aspect of the project. My tasks mainly invovled developing visual models to process and understand images, and language models to answer questions about images.

Vamshi Krishna
I love to solve problems | AI/ML expert
I worked on the image capture & testing.

Indah N
Alex Busch