EchoSight

App Icon

🌟 About the Project

🧭 Inspiration

EchoSight was inspired by the daily challenges faced by visually impaired individuals when navigating unfamiliar environments. While most navigation tools rely heavily on visual interfaces, few solutions provide real-time spatial awareness and natural voice interaction. We wanted to leverage Apple’s LiDAR technology and modern generative AI to create a tool that helps users “see” through sound, speech, and touch — enabling greater independence and safety.

🏗️ How We Built It

EchoSight was developed as an iOS AR application using SwiftUI, ARKit, and RealityKit for LiDAR scene reconstruction and depth sensing.

On-device object recognition was powered by FastViT (Core ML), providing fast and accurate detection without cloud latency.
For intelligent scene understanding, we integrated Google Gemini 2.5 Flash Lite through OpenRouter and Google AI Studio, enabling conversational Q&A and concise environment summaries.
Speech recognition and TTS (Text-to-Speech) were handled via Apple’s native frameworks, while Core Haptics delivered strong tactile feedback for nearby obstacles.
We optimized performance with background threading, autorelease pools, and frame throttling to balance real-time inference with battery efficiency.

💡 What We Learned

We learned how to combine multimodal AI (vision + language) with AR-based spatial understanding on mobile devices.
Integrating LiDAR depth data and visual recognition taught us the importance of sensor fusion and real-time processing pipelines.
Working with Google Gemini via OpenRouter gave us valuable experience in prompt engineering for accessibility-focused responses.
We also deepened our understanding of Apple’s speech and haptic frameworks, ensuring responsive and intuitive feedback for users.

⚙️ Challenges We Faced

Latency and synchronization between speech recognition, TTS, and AI responses — especially avoiding echo interference.
Balancing performance between continuous LiDAR scanning and real-time object detection on-device.
Prompt optimization for Gemini to generate concise and contextually accurate visual scene descriptions.
User accessibility testing, ensuring the app remains simple and intuitive despite the complexity of underlying systems.

🧮 A Note on the Technology

Our system can be modeled as a hybrid feedback loop: [ f(x) = \text{VLM}(\text{ARKit}(x)) + \text{Speech}(x) + \text{Haptics}(x) ] where ( x ) represents live environmental input. This continuous multimodal loop creates adaptive sensory feedback that enhances the user’s spatial awareness.

🌍 Impact

EchoSight demonstrates how AI + AR + Accessibility can intersect to make a meaningful difference. It’s not just an app — it’s a vision of how technology can extend human perception.