Inspiration
My inspiration for Vision comes from a powerful story in the Bible where Jesus heals a blind woman. Her joy in regaining sight made me realize how precious vision is — not just for seeing beauty, but for feeling safe and independent. That moment motivated me to build a tool that allows visually impaired users to “see” their environment in a new way — through sound.
What it does
Vision is a mobile app that uses a phone’s rear camera and AI to detect nearby obstacles and convert them into intelligent audio cues. It:
- Classifies object height: low, waist-high, tall, or flat (ignorable).
- Estimates distance: closer objects sound louder, distant ones quieter.
- Uses text-to-speech to identify the object by name.
This enables users to navigate spaces safely with sound-based spatial awareness.
How we built it
- Built using React Native (Expo) for cross-platform compatibility.
- Captured camera frames with Expo Camera API.
- Used Google Vision API to detect and classify objects.
- Estimated object height via bounding box positions.
- Calculated approximate distance using bounding box size, then adjusted volume:
$$ \text{Volume} \propto \frac{1}{\text{Distance}} $$ - Played categorized sounds using Expo AV.
- Displayed a dropdown UI of detected objects.
- Computed an accuracy score factoring in:
- Detection correctness
- Volume scaling
- Position logic
All functionality runs inside Expo Go, with no native code or builds needed.
Challenges we ran into
- Creating dynamic volume scaling based on object distance without hardware or native modules.
- Designing logic to estimate height categories from 2D bounding boxes.
- Preventing the dropdown UI from breaking when multiple detections occurred.
- Working entirely within Expo, due to no MacBook/Xcode for iOS builds.
Accomplishments that we're proud of
- Integrated the Google Vision API for advanced object detection.
- Built a multi-layer audio system for spatial feedback.
- Designed a custom accuracy algorithm to assess feedback quality.
- Delivered a fully functional prototype entirely in JavaScript/Expo.
What we learned
- Designing for accessibility means thinking through every sound, delay, and UI choice.
- Real-world detection requires fallback systems and smart filtering.
- Expo is surprisingly capable when used creatively.
- Even without native tools, assistive tech can be built effectively and meaningfully.
What's next for Vision
- 🧭 Add left/right spatial awareness using horizontal bounding box positions.
- 📸 Implement auto-frame capture so users don’t have to tap.
- 🔊 Improve audio feedback with more natural sounds.
- 🔭 Enable longer-range detection and zoom features.
- 💡 Explore offline ML models to speed up detection and remove API dependency.
Built With
- expo-camera
- expo-go
- expo.io
- google-cloud
- google-vision-api
- javascript
- react
- react-native
Log in or sign up for Devpost to join the conversation.