🌟 About the Project
🧭 Inspiration
EchoSight was inspired by the daily challenges faced by visually impaired individuals when navigating unfamiliar environments. While most navigation tools rely heavily on visual interfaces, few solutions provide real-time spatial awareness and natural voice interaction. We wanted to leverage Apple’s LiDAR technology and modern generative AI to create a tool that helps users “see” through sound, speech, and touch — enabling greater independence and safety.
🏗️ How We Built It
EchoSight was developed as an iOS AR application using SwiftUI, ARKit, and RealityKit for LiDAR scene reconstruction and depth sensing.
- On-device object recognition was powered by FastViT (Core ML), providing fast and accurate detection without cloud latency.
- For intelligent scene understanding, we integrated Google Gemini 2.5 Flash Lite through OpenRouter and Google AI Studio, enabling conversational Q&A and concise environment summaries.
- Speech recognition and TTS (Text-to-Speech) were handled via Apple’s native frameworks, while Core Haptics delivered strong tactile feedback for nearby obstacles.
- We optimized performance with background threading, autorelease pools, and frame throttling to balance real-time inference with battery efficiency.
💡 What We Learned
- We learned how to combine multimodal AI (vision + language) with AR-based spatial understanding on mobile devices.
- Integrating LiDAR depth data and visual recognition taught us the importance of sensor fusion and real-time processing pipelines.
- Working with Google Gemini via OpenRouter gave us valuable experience in prompt engineering for accessibility-focused responses.
- We also deepened our understanding of Apple’s speech and haptic frameworks, ensuring responsive and intuitive feedback for users.
⚙️ Challenges We Faced
- Latency and synchronization between speech recognition, TTS, and AI responses — especially avoiding echo interference.
- Balancing performance between continuous LiDAR scanning and real-time object detection on-device.
- Prompt optimization for Gemini to generate concise and contextually accurate visual scene descriptions.
- User accessibility testing, ensuring the app remains simple and intuitive despite the complexity of underlying systems.
🧮 A Note on the Technology
Our system can be modeled as a hybrid feedback loop: [ f(x) = \text{VLM}(\text{ARKit}(x)) + \text{Speech}(x) + \text{Haptics}(x) ] where ( x ) represents live environmental input. This continuous multimodal loop creates adaptive sensory feedback that enhances the user’s spatial awareness.
🌍 Impact
EchoSight demonstrates how AI + AR + Accessibility can intersect to make a meaningful difference. It’s not just an app — it’s a vision of how technology can extend human perception.
Built With
- arkit
- asr
- avfoundation
- fastvit
- framework
- gemini
- google-ai-studio
- haptics
- ios
- lidar
- ml
- openrouter
- propertylist
- realitykit
- swift-5.9
- swiftui
- text-to-speech
- xcode

Log in or sign up for Devpost to join the conversation.