EYEVISION

Thumbnail

Inspiration

Millions of visually impaired individuals face daily challenges in navigating their environment and accessing visual information. We were inspired to create EYEVISION as an AI-powered assistant to provide real-time image analysis and feedback, enabling greater independence for blind and visually impaired users. Our goal is to bridge the accessibility gap using AI and intuitive design.

What It Does

EYEVISION captures images using a device’s camera, analyzes them with AI, and provides concise spoken descriptions. Users can ask follow-up questions about the image, receiving detailed and relevant answers through speech. The app also helps users understand colors, objects, and surroundings with minimal effort.

How We Built It

We developed EYEVISION using:

Next.js & React for the front-end interface
Google Gemini AI API for image analysis and question-answering
Web Speech API for text-to-speech and speech recognition
MediaStream API for real-time camera access

The system processes captured images, extracts key visual details, summarizes them for easy understanding, and responds to user queries in natural language.

Challenges We Ran Into

Ensuring accurate image-to-text conversion for complex environments
Fine-tuning AI responses to prioritize user-relevant details. For example, when capturing a picture of a kitchen space with ingredients, the AI should not only describe the scene but also be able to suggest recipes based on the identified ingredients.
Speech synthesis inconsistencies across different devices and browsers
Handling real-time camera flipping and optimization for mobile performance

Accomplishments That We're Proud Of

This is the team's first time working with AI, and we successfully integrated AI-powered image analysis with real-time speech feedback. We also optimized responses to prioritize relevant image details to an extent. Another thing we are proud of is being able to implement multi-device compatibility to ensure accessibility across different platforms.