About the project: Blind Man’s Eye (BME)
Built by a team of high school students driven by curiosity and empathy, BME proves that impactful assistive tech can be created early, with rigor, creativity, and user-first design.
Inspiration
Navigating unfamiliar spaces is effortless for sighted people but risky and exhausting for those with visual impairments. Traditional aids like white canes offer only proximal feedback and miss obstacles beyond reach or at head height. We were inspired, as high schoolers, to create a wearable, real-time system that translates the visual world into clear, intuitive audio cues, so independence isn’t limited by environment, lighting, or complexity.
What we built
BME is a lightweight, camera-based assistant that converts a live video stream into spatialized audio signals:
- Real-time monocular depth estimation to understand scene geometry
- Obstacle detection and prioritization based on proximity and direction
- Intuitive 3D audio cues (left/right, near/far) for fast decision-making
- A minimal, comfortable form factor designed for continuous use
At its core, BME takes a single RGB frame, produces a dense depth map, identifies hazardous regions, and emits directional audio to “point” the user around obstacles.
How it works (system overview)
Perception
- Single-board camera captures frames at 30+ FPS
- Depth estimation produces a relative depth map $$D(x, y)$$
- Single-board camera captures frames at 30+ FPS
Understanding
- Thresholding and region analysis extract obstacles $$O_i$$
- Each $$O_i$$ gets a direction $$\theta_i$$ and proximity $$d_i$$
- Thresholding and region analysis extract obstacles $$O_i$$
Audio mapping
- Spatial audio engine encodes $$\theta_i$$ as interaural time and level differences
- Proximity maps to pitch/volume cadence for urgency
- Spatial audio engine encodes $$\theta_i$$ as interaural time and level differences
Feedback
- Real-time cues guide micro-adjustments in heading and speed
Mathematical intuition
Depth normalization:
For raw depth $$r(x, y)$$, compute normalized depth:
$$ D(x, y) = \frac{r(x, y) - r_{\min}}{r_{\max} - r_{\min} + \varepsilon} $$Obstacle scoring (per connected component $$C$$):
Proximity score:
$$ s_C = 1 - \text{median}\left(D(x, y) \mid (x, y) \in C\right) $$
Direction (in image plane):
$$ \theta_C = \arctan2(x_C - x_0, f) $$
where $$x_C$$ is component centroid, $$x_0$$ is image center, and $$f$$ is a focal proxy.Audio mapping (ITD/ILD-inspired):
$$ \text{ITD}(\theta) \propto \sin(\theta),\quad \text{ILD}(\theta) \propto k\,(1 - \cos(\theta)) $$
Volume $$v \propto s_C$$, cadence $$c \propto s_C$$ for urgency.
Tech stack
- Computer vision: monocular depth estimation (fast variant for edge devices)
- Signal processing: obstacle clustering, direction and proximity scoring
- Audio: binaural spatialization with ITD/ILD-inspired panning and dynamic gain
- Runtime: Python-based prototype; portable to embedded platforms
What we learned
- Depth from a single camera is remarkably robust for navigation when paired with conservative heuristics and temporal smoothing.
- Audio UX matters as much as perception accuracy, users prefer fewer, clearer cues over dense, continuous sound.
- Latency is a safety feature: ultra-low overhead decisions (what to play and when) are as important as low-latency inference.
- As high school builders, we learned to validate ideas with quick prototypes, run user tests early, and turn complex research into practical features.
Challenges
- Balancing speed and stability: achieving 30+ FPS while avoiding jitter required careful buffering and interpolation choices.
- Avoiding audio overload: multiple obstacles can overwhelm the user; we built salience filters to limit cues to the most relevant hazards.
- Real-world variability: lighting, motion blur, reflective surfaces, and crowded scenes needed robust thresholding and fallback logic.
- Direction perception: translating 2D image coordinates into convincing spatial audio required iterative tuning and user testing.
- Resource limits: with student budgets and hardware, we optimized models and pipelines aggressively without sacrificing safety.
Impact
BME aims to:
- Reduce collision risk (especially above cane height)
- Lower cognitive load by externalizing spatial reasoning into audio
- Increase confidence and independence in unfamiliar or dynamic environments
- Demonstrate that high school teams can build serious, user-centered assistive technology
Next steps
- Add semantic cues (e.g., crosswalks, doors, stairs) to complement geometry
- Personalize audio profiles to user preferences and hearing asymmetries
- Integrate haptics for silent or noisy environments
- Battery-optimized embedded build for all-day wear
- Conduct more user studies with vision-impaired participants and iterate on audio UX
Team reflections
We set out, as high school students, to make the invisible, audible. Building BME taught us that accessibility is not just about detecting obstacles; it’s about communicating the right information, at the right time, in the most intuitive way possible. The biggest lesson: impactful assistive tech isn’t about age or resources— it’s about empathy, persistence, and thoughtful design.
Log in or sign up for Devpost to join the conversation.