Inspiration Traditional navigation aids for visually impaired individuals, like walking canes, are incredibly effective but have clear physical limitations—they can only detect obstacles at ground level and are limited by physical reach. Our team wanted to build an intelligent, hands-free spatial companion that fills these gaps using modern technology. We designed it around a core tenet: absolute privacy and speed. By leveraging on-device machine learning, we sought to convert live camera feeds into an intuitive, non-visual sensory map—utilizing spatial stereo acoustics, voice synthesis, haptic feedback, and voice commanding to give users a true "second sight" of their surroundings. 🔍 What it does The application serves as a real-time spatial monitor that maps local environments, flags immediate safety blockages, and communicates them dynamically: Real-time Object Recognition: Instantly classifies everyday obstacles (people, cars, chairs, bags, cups, transit symbols) completely offline using an on-device neural network. Smart Outdoor & Traffic Autofill: Automatically boots up pre-armed to watch for critical traffic hazards like pedestrians, vehicles, motorcycles, stop signs, and traffic lights to protect the user during outdoor commutes. Virtual Floor-to-Wall Lidar: Uses a vertical multi-channel scanline matrix to locate intersections where the floor meets the wall, estimating real-world distances without requiring a physical depth-sensing camera. Stereo Panoramic Soundscapes: Pushes spatialized audio alerts matching the threat's location—panning sound clicks left, center, or right with acoustic frequencies pacing faster as an object nears. Text-to-Speech Vocalizer: Intelligently summarizes active threats aloud, stating the name of the hazard and its computed distance (e.g., "Warning: Car, 5 feet away" or "Stop Sign, 10 feet"). Calibration & Sensory Monitor: Displays an inline diagnostic HUD with dashed threshold bounding boxes, crosshair coordinates, and active stream trackers to help calibrate sensor sensitivity. Voice Command Navigation: Supports hands-free activation—users can speak simple, immediate voice prompts like "START" or "STOP" to toggle scanning engines. 🛠️ How we built it We architected a fully client-side, lightweight full-stack structure optimized for speed and portable performance: The AI Core: Built on TensorFlow.js utilizing a lightweight pre-trained MobileNet-based COCO-SSD model. This runs completely in-browser, ensuring zero cloud backend costs, absolute privacy, and sub-millisecond local latency. The Sound Engine: Developed a custom Web Audio API Spatializer. It synthesizes custom sine/triangle oscillator waves, filters frequencies, and adjusts gain-nodes and stereo-panners in real time to model true 3D spatial environments. The Spatial Solver: Created trigonometric depth-scaling heuristics which translate 2D screen bounding boxes and aspect ratios into real-world distance estimates (Feet/Meters), along with a customized contrast gradient filter mapping vertical wall structures. The Frontend: Craft-designed with React, TypeScript, and Tailwind CSS, featuring dark slate high-contrast palettes, smooth micro-animations powered by Motion, and a clean retro-functional layout. ⚠️ Challenges we ran into CPU Bottlenecks on High-Frequency AI Loops: Powering high-definition media-stream pipes through neural nets in standard mobile browsers can quickly cause heavy lag. We solved this by using React refs to bypass standard component rendering delays, implementing frame throttling, and isolating state dependencies to maintain a stable, fluid 30+ FPS. Acoustical Overload & Noise Fatigue: Announcing multiple overlapping targets simultaneously can create confusing noise pollution. To defeat this, we implemented an dynamic queue priority scheduler that ensures only the most immediate hazard triggers a spoken warning, backed by a global audio cooldown gate. Sandboxed WebRTC Constraints: Browsers restrict webcam or speech access inside nested frames. This inspired us to build clear guidance and graceful fallbacks so the app alerts users to open the preview in a single standalone browser tab when seeking full microphone and camera capabilities. 🏆 Accomplishments that we're proud of Zero-Server Backend Architecture: Successfully deployed a highly advanced spatial processing application that requires absolutely zero backend servers, preserving user privacy with 100% on-device operations. True Stereo Integration: Crafting an acoustic synthesizer from scratch using raw nodes that elegantly alerts users of obstacle vectors without visual screens. Sensory Calibration Interface: Designing a distinct, highly readable HUD that visually acts as a diagnostic overlay for developers and users calibrating camera perspectives in real-time. 📚 What we learned Designing with Accessibility-First Intent: Building for visually impaired users requires a complete shift in UX/UI principles. You must evaluate how touch targets, speech recognition, sound cues, and pacing can completely substitute standard visual layouts. Efficient Memory & State Hygiene: Working with continuously looping canvas and pixel arrays in React taught us strict garbage-collection habits—ensuring that event listeners, audio nodes, and animation frames are properly cleaned up to prevent memory leaks. 🔮 What's next for obstacle detector Hardware LiDAR API integration: Incorporating raw Depth APIs from hardware sensors present in newer phones and iPads to capture millimeter-accurate spatial maps. Acoustic Humming Landscapes: Transforming binary warning beeps into a rich, pleasant ambient hum that builds a continuous mental map of a room's physical shape. Custom Object Registration: Allowing users to upload a single snap of an item (e.g., "my keys", "my workspace charger") so the on-device AI can identify specialized objects in their personal workspace.

Share this project:

Updates