About the project: Blind Man’s Eye (BME)

Built by a team of high school students driven by curiosity and empathy, BME proves that impactful assistive tech can be created early, with rigor, creativity, and user-first design.

Inspiration

Navigating unfamiliar spaces is effortless for sighted people but risky and exhausting for those with visual impairments. Traditional aids like white canes offer only proximal feedback and miss obstacles beyond reach or at head height. We were inspired, as high schoolers, to create a wearable, real-time system that translates the visual world into clear, intuitive audio cues, so independence isn’t limited by environment, lighting, or complexity.

What we built

BME is a lightweight, camera-based assistant that converts a live video stream into spatialized audio signals:

Real-time monocular depth estimation to understand scene geometry
Obstacle detection and prioritization based on proximity and direction
Intuitive 3D audio cues (left/right, near/far) for fast decision-making
A minimal, comfortable form factor designed for continuous use

At its core, BME takes a single RGB frame, produces a dense depth map, identifies hazardous regions, and emits directional audio to “point” the user around obstacles.

How it works (system overview)

Perception
- Single-board camera captures frames at 30+ FPS
- Depth estimation produces a relative depth map $$D(x, y)$$
Understanding
- Thresholding and region analysis extract obstacles $$O_i$$
- Each $$O_i$$ gets a direction $$\theta_i$$ and proximity $$d_i$$
Audio mapping
- Spatial audio engine encodes $$\theta_i$$ as interaural time and level differences
- Proximity maps to pitch/volume cadence for urgency
Feedback
- Real-time cues guide micro-adjustments in heading and speed

Mathematical intuition

Depth normalization:
For raw depth $$r(x, y)$$, compute normalized depth:
$$ D(x, y) = \frac{r(x, y) - r_{\min}}{r_{\max} - r_{\min} + \varepsilon} $$
Obstacle scoring (per connected component $$C$$):
Proximity score:
$$ s_C = 1 - \text{median}\left(D(x, y) \mid (x, y) \in C\right) $$
Direction (in image plane):
$$ \theta_C = \arctan2(x_C - x_0, f) $$
where $$x_C$$ is component centroid, $$x_0$$ is image center, and $$f$$ is a focal proxy.
Audio mapping (ITD/ILD-inspired):
$$ \text{ITD}(\theta) \propto \sin(\theta),\quad \text{ILD}(\theta) \propto k\,(1 - \cos(\theta)) $$
Volume $$v \propto s_C$$, cadence $$c \propto s_C$$ for urgency.

Tech stack

Computer vision: monocular depth estimation (fast variant for edge devices)
Signal processing: obstacle clustering, direction and proximity scoring
Audio: binaural spatialization with ITD/ILD-inspired panning and dynamic gain
Runtime: Python-based prototype; portable to embedded platforms

What we learned

Depth from a single camera is remarkably robust for navigation when paired with conservative heuristics and temporal smoothing.
Audio UX matters as much as perception accuracy, users prefer fewer, clearer cues over dense, continuous sound.
Latency is a safety feature: ultra-low overhead decisions (what to play and when) are as important as low-latency inference.
As high school builders, we learned to validate ideas with quick prototypes, run user tests early, and turn complex research into practical features.

Challenges

Balancing speed and stability: achieving 30+ FPS while avoiding jitter required careful buffering and interpolation choices.
Avoiding audio overload: multiple obstacles can overwhelm the user; we built salience filters to limit cues to the most relevant hazards.
Real-world variability: lighting, motion blur, reflective surfaces, and crowded scenes needed robust thresholding and fallback logic.
Direction perception: translating 2D image coordinates into convincing spatial audio required iterative tuning and user testing.
Resource limits: with student budgets and hardware, we optimized models and pipelines aggressively without sacrificing safety.

Impact

BME aims to:

Reduce collision risk (especially above cane height)
Lower cognitive load by externalizing spatial reasoning into audio
Increase confidence and independence in unfamiliar or dynamic environments
Demonstrate that high school teams can build serious, user-centered assistive technology

Next steps

Add semantic cues (e.g., crosswalks, doors, stairs) to complement geometry
Personalize audio profiles to user preferences and hearing asymmetries
Integrate haptics for silent or noisy environments
Battery-optimized embedded build for all-day wear
Conduct more user studies with vision-impaired participants and iterate on audio UX

Team reflections

We set out, as high school students, to make the invisible, audible. Building BME taught us that accessibility is not just about detecting obstacles; it’s about communicating the right information, at the right time, in the most intuitive way possible. The biggest lesson: impactful assistive tech isn’t about age or resources— it’s about empathy, persistence, and thoughtful design.

Built With

c++
cuda
itd
magma
midas
numpy
opencv
pyaudio
pytest
python
pytorch
virtualenv

Updates

Ekveer Sahoo started this project — Aug 23, 2025 08:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.