🌟 About the Project

🧭 Inspiration

EchoSight was inspired by the daily challenges faced by visually impaired individuals when navigating unfamiliar environments. While most navigation tools rely heavily on visual interfaces, few solutions provide real-time spatial awareness and natural voice interaction. We wanted to leverage Apple’s LiDAR technology and modern generative AI to create a tool that helps users “see” through sound, speech, and touch — enabling greater independence and safety.

🏗️ How We Built It

EchoSight was developed as an iOS AR application using SwiftUI, ARKit, and RealityKit for LiDAR scene reconstruction and depth sensing.

  • On-device object recognition was powered by FastViT (Core ML), providing fast and accurate detection without cloud latency.
  • For intelligent scene understanding, we integrated Google Gemini 2.5 Flash Lite through OpenRouter and Google AI Studio, enabling conversational Q&A and concise environment summaries.
  • Speech recognition and TTS (Text-to-Speech) were handled via Apple’s native frameworks, while Core Haptics delivered strong tactile feedback for nearby obstacles.
  • We optimized performance with background threading, autorelease pools, and frame throttling to balance real-time inference with battery efficiency.

💡 What We Learned

  • We learned how to combine multimodal AI (vision + language) with AR-based spatial understanding on mobile devices.
  • Integrating LiDAR depth data and visual recognition taught us the importance of sensor fusion and real-time processing pipelines.
  • Working with Google Gemini via OpenRouter gave us valuable experience in prompt engineering for accessibility-focused responses.
  • We also deepened our understanding of Apple’s speech and haptic frameworks, ensuring responsive and intuitive feedback for users.

⚙️ Challenges We Faced

  1. Latency and synchronization between speech recognition, TTS, and AI responses — especially avoiding echo interference.
  2. Balancing performance between continuous LiDAR scanning and real-time object detection on-device.
  3. Prompt optimization for Gemini to generate concise and contextually accurate visual scene descriptions.
  4. User accessibility testing, ensuring the app remains simple and intuitive despite the complexity of underlying systems.

🧮 A Note on the Technology

Our system can be modeled as a hybrid feedback loop: [ f(x) = \text{VLM}(\text{ARKit}(x)) + \text{Speech}(x) + \text{Haptics}(x) ] where ( x ) represents live environmental input. This continuous multimodal loop creates adaptive sensory feedback that enhances the user’s spatial awareness.

🌍 Impact

EchoSight demonstrates how AI + AR + Accessibility can intersect to make a meaningful difference. It’s not just an app — it’s a vision of how technology can extend human perception.

Built With

  • arkit
  • asr
  • avfoundation
  • fastvit
  • framework
  • gemini
  • google-ai-studio
  • haptics
  • ios
  • lidar
  • ml
  • openrouter
  • propertylist
  • realitykit
  • swift-5.9
  • swiftui
  • text-to-speech
  • xcode
Share this project:

Updates