A-Eye Project

Inspiration

Navigating the world safely can be challenging for visually impaired people. We wanted to create a technology that provides real-time information about the environment, helping users recognize objects and understand distances without relying on sight.

What it does

A-Eye operates on a smart camera system that does the following tasks:

Detects objects in front of the user.
Announces the type and amount of each of the objects
Provides the distance to the closest object.

$$ \text{distance (in)} = \text{distance (m)} \times 39.37, \quad \text{feet} = \lfloor \frac{\text{inches}}{12} \rfloor, \quad \text{inches} = \text{inches} \% 12 $$

Responds to voice commands such as "What is in front of me?"

How we built it

Object detection: YOLOv8 model to detect multiple objects in real time.
Audio output: Uses 'pyttsx3' library download for text-to-speech function.
Voice input: 'SpeechRecognition' + microphone input for processing user commands.
Object counting: Python with the 'inflect' library to generate natural-language pluralization.
Distance measurement: Simulated in testing; converts meters to feet/inches for real-world feedback.

Challenges we ran into

Accurately capturing voice commands while minimizing interference from ambient noise.
Effectively managing duplicate objects and ensuring their correct counting in natural language.
Seamlessly integrating real-time detection with prompt audio output, eliminating delays.
Transforming numerical distances into units that are easily understood in natural speech.

Accomplishments that we're proud of

We are proud of building a fully functional prototype that identifies objects and announces them with audio in real time. Additionally, we are proud of all the different functionalities included with our smart camera project like the natural-sounding speech and the implemented voice commands.

What we learned

How to integrate computer vision, natural language processing, and audio output in a single Python system.
Techniques to handle pluralization and natural speech generation for dynamic object counts.
Challenges of real-time processing and avoiding delays in user feedback.

What's next for A-Eye

Combine actual camera input with YOLOv8 for practical testing.
Enhance distance precision through the use of depth sensors or stereo vision.
Broaden voice command capabilities to accommodate more complex questions.
Refine performance for quicker responses and more compact hardware to ensure portability.