SceneAssist

Inspiration

Navigating indoor spaces feels effortless for sighted people, but for the visually impaired, it's a constant challenge. Crowded hallways, unmarked rooms, unexpected obstacles in indoor spaces are everyday barriers that most people never think about. We wanted to build something that could bridge that gap using just a smartphone, making independence accessible to blind and low-vision people without any specialized hardware.

What it does

SceneAssist turns your phone into a pair of eyes for visually impaired users. It uses your camera to understand the environment and converts everything into short, clear voice descriptions. From detecting obstacles like chairs, tables, and doors to reading signs and room numbers, SceneAssist gives users the awareness they need to move through any space with confidence.

How we built it

We built SceneAssist as an iOS app using Swift, combining Claude AI for computer vision and extracting meaningful insights from the environment. On top of that, we integrated an image captioning model for scene understanding, object detection for obstacle identification, and OCR for sign and text reading. All visual input is processed and converted into structured, natural language, which is then delivered to the user through text-to-speech audio output.

Challenges we ran into

Deciding how much to include in each voice description was tricky. Too much information overwhelms the user, while too little leaves them lost, and finding that balance required careful tuning.
Even small delays in audio feedback can be disorienting for visually impaired users, making speed and accuracy both critical and difficult to balance simultaneously.
Crafting the right prompts for Claude AI took a lot of experimentation. We needed responses that were accurate, structured, and natural sounding, and getting that right was a highly iterative process.

Accomplishments that we're proud of

Shipping a fully functional iOS app within a hackathon timeframe, complete with real time AI processing and voice output, is something we are genuinely proud of.
Building bilingual support for both English and Mandarin, making SceneAssist accessible to a wider range of users right from the start.
SceneAssist successfully delivers accurate scene descriptions, obstacle detection, and sign reading in both English and Mandarin, proving the app works seamlessly across languages.
Getting obstacle detection to reliably report approximate position and deliver safety warnings in real time was a meaningful milestone for real-world usability.
As a team with both tech and business backgrounds, we blended our strengths to build something that was not only technically solid but also thoughtfully designed around real user needs.

What we learned

We learned that designing for accessibility requires a fundamentally different mindset. Every decision, from how long a description is to the order information is spoken, directly impacts usability. We also deepened our understanding of incorporating AI into a real world product, from choosing the right models for each task to engineering prompts that produce responses that are accurate, concise, and natural sounding. Bringing multiple AI models together into a single cohesive pipeline was a challenge in itself, and navigating that taught us a lot about how to build with AI intentionally and effectively.

What's next for SceneAssist

Expand beyond indoor environments and improve obstacle detection to include accurate distance measurement.
Add support for more global languages to make SceneAssist accessible to a wider audience.
Fine-tune the model for specific known layouts like hospitals, airports, and offices for more optimized and context-aware guidance.
Shift to an on-device model for faster processing and better privacy.
Explore wearable integration and add user personalization so people can control the level of detail in their descriptions.