India is home to the world's largest visually impaired population. While traditional white canes help detect obstacles on the ground, they fail to warn users about hanging wires, low-clearance boards, or upcoming stairs. Navigating crowded Indian streets remains a stressful and dependent experience. We wanted to build something beyond simple sensors—an empathetic, intelligent companion that acts as a real-time visual interpreter, giving visually impaired individuals the confidence to walk independently.

NavAssist is an AI-powered multimodal copilot that transforms a standard wearable or smartphone camera into an intelligent navigation assistant.

Real-Time Visual Guidance: It streams camera frames and uses AI to give instant, natural audio instructions in Hinglish (e.g., "Aage right side me obstacle hai, thoda baayein se chaliye").

Guardian Dashboard: A live web dashboard featuring a dynamic GPS tracker where family members can remotely monitor the user's route and see contextual hazards (Barriers, Crosswalks, Stairs) update in real-time.

Dual-Layer Emergency SOS: In a panic situation, a simple voice command ("Help") or button press automatically shares the user's exact high-accuracy Google Maps location to a relative's WhatsApp and initiates a direct telephone call.

We developed NavAssist using a robust, modern full-stack architecture:

Frontend: Built with React.JS and styled using Tailwind CSS. We utilized the HTML5 Video Capture API for live streaming and integrated React-Leaflet for the interactive map interface, applying Linear Interpolation (LERP) to achieve fluid, seamless marker movement between GPS points.

Backend: A lightweight, high-performance Flask (Python) server acts as the central router, handling secure cross-origin requests via Flask-CORS.

AI Engine (The Core): We integrated Gemini 2.5 Flash for its exceptional multimodal speeds. We engineered custom system instructions to force action-oriented, short replies and fine-tuned parameters like temperature: 0.3 for predictable, safety-first navigation and max_output_tokens to optimize API computation speeds to under a second.

Audio Pipeline: Leveraging browser-native webkitSpeechRecognition for voice inputs and SpeechSynthesisUtterance configured with localized Indian accent engines (hi-IN) for clear text-to-speech feedback.

API Response Latency: Initially, sending raw high-resolution images caused a bottleneck, leading to late responses. We resolved this by compressing the HTML5 canvas frame to JPEG 0.50 quality on the frontend and capping Gemini's output tokens, bringing down the end-to-end response time drastically.

Leaflet Map Jitter: The fast frame tickers caused the map center to continuously freeze or lag. We fixed this by shifting from aggressive flyTo methods to smooth panTo handlers paired with coordinate interpolation.

Browser Autoplay & Audio Blocks: Modern browsers block automated speech synthesis without user interaction. We bypass this by silently initializing and unblocking the browser's audio engine on the very first user interaction or voice trigger.

Successfully creating a functioning end-to-end Multimodal AI pipeline that processes vision and voice concurrently.

Building a highly descriptive Guardian Dashboard that updates contextually without lagging the system.

Creating a direct safety feature (WhatsApp + Call Integration) that adds massive real-world social impact to the project.

We mastered Multimodal Prompt Engineering and understood how hyper-parameters like temperature and token caps can dramatically affect the latency of LLMs in critical real-time systems.

Learned how to manage hardware-abstracted browser APIs like Geolocation, Speech-to-Text, and Audio synthesis in a synchronized loop.

Understood the importance of designing software with extreme empathy for accessibility standards.

Edge AI Integration: Migrating the core vision pipeline from cloud APIs to local deployment on hardware like Raspberry Pi or Jetson Nano using Google MediaPipe / MobileNet for 100% offline obstacle detection.

Hardware Form Factor: Integrating the setup into physical 3D-printed smart glasses equipped with a wide-angle camera and bone-conduction earphones.

Advanced Cloud Telephony: Upgrading the emergency call system by integrating Twilio Voice API to send automated, server-side cellular calls to relatives even if the user's phone has no active network pack.

Built With

Share this project:

Updates