Inspiration
NaviSound was born from a deeply personal moment. My friend recently suffered temporary blindness and suddenly faced a harsh reality: navigating the world without vision meant constant dependence on others. What struck me most wasn't just the physical challenge, it was the emotional toll. He broke down describing how asking for help made him feel like a burden, stripping away his independence and confidence.
That conversation changed everything. I realized the problem wasn't that navigation is hard, it's that existing solutions treat blind and low-vision users as passive recipients of help rather than autonomous agents. We could do better.
What it does
NaviSound is an AI-powered spatial audio navigation system that gives users true independence. By combining:
Real-time scene understanding using Google's Gemini Vision API Spatial audio rendering that places sounds in 3D space (left, right, up, down) Intelligent navigation agents coordinating hazard detection, pathfinding, and contextual awareness Multi-modal sensor fusion from cameras, gyroscopes, and accelerometers Users get a continuous, immersive audio map of their surroundings. Instead of passive directions, they hear their environment, obstacles appear as spatial warnings, landmarks as audio beacons, and routes as intuitive sonic guidance.
How we built it
Architecture:
Frontend: Electron app with React + TypeScript for accessibility-first UI and real-time sensor capture Backend: Python orchestrator with Redis caching, coordinating pre-specialized LLM agents Cloud: Google Cloud Run, Vertex AI, Gemini Vision API for scalable, low-latency inference Real-time engine: WebSocket-based communication and spatial audio processing via Web Audio API
Key innovation: Instead of a monolithic model, we designed an agent-based system where each component (hazard detection, navigation, scene context) runs independently with Redis-backed state, enabling faster iteration and more reliable degradation.
Challenges we ran into
Latency: Vision-based navigation demands sub-500ms round-trip inference. We solved this through Redis caching, batch processing, and strategic fallbacks to lightweight models for non-critical tasks.
Spatial audio realism: Creating convincing 3D audio positioning while handling hardware variability across devices required building custom HRTF processing and extensive testing.
Deployment complexity: Coordinating microservices across GCP (Run, Redis Memorystore, VPC connectors) while keeping costs rational. Load testing revealed bottlenecks we solved with connection pooling and smarter resource allocation.
Real-world validation: Testing with actual users revealed that theoretical accessibility doesn't always match lived experience, we pivoted to more contextual, less overwhelming audio output.
Accomplishments that we're proud of
Sub-400ms latency from camera capture to spatial audio output, fast enough for real-time navigation Multi-modal agent orchestration that gracefully degrades if one service fails Accessibility-first design validated with user testing; ranked highly for WCAG compliance Production-ready deployment on GCP with auto-scaling and cost-efficient serverless architecture Comprehensive testing suite covering unit, integration, and real-world latency scenarios
What we learned
The biggest lesson: accessibility isn't a feature, it's a design philosophy. Every technical decision cascaded into user experience. We also learned that speed matters more than perfection; users prefer fast, slightly imperfect audio cues over delayed, pristine ones.
What's next for NaviSound
Mobile app: Native iOS/Android with on-device ML for privacy and offline capability Outdoor expansion: GPS + real-time map integration for broader navigation Haptic feedback: Combining spatial audio with vibration patterns for richer feedback Community feedback loop: Building a user advisory board to continuously refine the experience
Built With
- docker
- googlecloudplatform
- googlegeminivisionapi
- node.js
- postgresql
- python
- react-native
- redis
- spatialaudiowebaudioapi
- typescript
- vertexai
- websocket
Log in or sign up for Devpost to join the conversation.