Inspiration

Seeing the elderly struggle with using digital platforms due to visual impairments, It gave me an idea. I developed a web platform where users could upload documents or images. Gemini AI processes these uploads and communicates the content to the user through speech. I believe that if I continue to build on this idea and project, this tool can help many visually impaired individuals or anyone who can't read all the 'fine print' in some documents.

What it does

This website takes in PDF or DOCX documents and extracts the text from them using different JS libraries. It then uses that text as the input prompt for the AI alongside the user's voice input. The user can also ask questions regarding specific parts of the input text to answer by just talking. If an image is uploaded, the code automatically uses the Gemini-Pro-Vision model to "look" at it and then the user can ask questions about it.

How I built it

This was built using Javascript, HTML, and CSS, and intended to be hosted online on a static website.

Challenges I ran into

Using a brand-new technology came with multiple errors and bugs. One notable challenge was to get the program to automatically switch models depending on the type of input the user gave the program. Now most of the errors are gone and it should work flawlessly.

Accomplishments that we're proud of

Learned how to use Gemini API

What I learned

I now learned how the power of AI can be harnessed, from how to switch models and provide different types of input.

What's next for VoiceAid

With the newer Gemini models, many more features are surely to come to improve the quality of life of the end user, from larger file uploads, to video recording uploads and more.

Built With

Share this project:

Updates