Inspiration

My inspiration for Vision comes from a powerful story in the Bible where Jesus heals a blind woman. Her joy in regaining sight made me realize how precious vision is — not just for seeing beauty, but for feeling safe and independent. That moment motivated me to build a tool that allows visually impaired users to “see” their environment in a new way — through sound.

What it does

Vision is a mobile app that uses a phone’s rear camera and AI to detect nearby obstacles and convert them into intelligent audio cues. It:

  • Classifies object height: low, waist-high, tall, or flat (ignorable).
  • Estimates distance: closer objects sound louder, distant ones quieter.
  • Uses text-to-speech to identify the object by name.

This enables users to navigate spaces safely with sound-based spatial awareness.

How we built it

  • Built using React Native (Expo) for cross-platform compatibility.
  • Captured camera frames with Expo Camera API.
  • Used Google Vision API to detect and classify objects.
  • Estimated object height via bounding box positions.
  • Calculated approximate distance using bounding box size, then adjusted volume:
    $$ \text{Volume} \propto \frac{1}{\text{Distance}} $$
  • Played categorized sounds using Expo AV.
  • Displayed a dropdown UI of detected objects.
  • Computed an accuracy score factoring in:
    • Detection correctness
    • Volume scaling
    • Position logic

All functionality runs inside Expo Go, with no native code or builds needed.

Challenges we ran into

  • Creating dynamic volume scaling based on object distance without hardware or native modules.
  • Designing logic to estimate height categories from 2D bounding boxes.
  • Preventing the dropdown UI from breaking when multiple detections occurred.
  • Working entirely within Expo, due to no MacBook/Xcode for iOS builds.

Accomplishments that we're proud of

  • Integrated the Google Vision API for advanced object detection.
  • Built a multi-layer audio system for spatial feedback.
  • Designed a custom accuracy algorithm to assess feedback quality.
  • Delivered a fully functional prototype entirely in JavaScript/Expo.

What we learned

  • Designing for accessibility means thinking through every sound, delay, and UI choice.
  • Real-world detection requires fallback systems and smart filtering.
  • Expo is surprisingly capable when used creatively.
  • Even without native tools, assistive tech can be built effectively and meaningfully.

What's next for Vision

  • 🧭 Add left/right spatial awareness using horizontal bounding box positions.
  • 📸 Implement auto-frame capture so users don’t have to tap.
  • 🔊 Improve audio feedback with more natural sounds.
  • 🔭 Enable longer-range detection and zoom features.
  • 💡 Explore offline ML models to speed up detection and remove API dependency.

Built With

Share this project:

Updates