Inspiration
Most AI assistants today can see the world, but they don’t understand urgency. For visually impaired users, the real challenge isn’t knowing what exists, but knowing what matters right now.
A laptop nearby is harmless. A person approaching is not. A loud warning sound could be critical.
We realized that existing tools focus on detection, not decision-making.
That’s where SightSense was born: an assistant that doesn’t just see the world, but interprets it for safety.
What it does
SightSense is a real-time AI safety assistant that provides situational awareness, not just object detection.
It - Detects objects using on-device AI Understands position and distance Tracks motion over time (e.g., “approaching”) Detects warning sounds Combines everything into prioritized alerts Instead of overwhelming users, it filters: Laptop nearby: LOW (ignored) Person approaching: HIGH (urgent) Loud sound: HIGH (override) It communicates through:
- Voice (TTS)
- Haptics
- Minimal, accessible UI And runs fully on-device, ensuring:
- Low latency
- Privacy
- Offline reliability
How we built it
We designed SightSense as a multimodal edge AI pipeline: Vision Layer LiteRT / TensorFlow Lite model (.tflite) Runs on NPU/GPU/CPU via delegate fallback Extracts:
- object label
- position (left/center/right)
- distance (near/medium/far) Audio Layer Lightweight signal-based detection (RMS + amplitude) Identifies warning-level sounds in real time Fusion Engine (Core Innovation) Deterministic reasoning system Combines:
- object type
- spatial context
- motion trends
- audio signals Outputs:
- LOW / MEDIUM / HIGH priority alerts
- Motion Intelligence
- Tracks distance changes across frames: far -> medium -> near = approaching Accessibility Layer Voice-first interaction (TTS) Haptic alerts Minimal UI Optimization LiteRT delegate fallback (CNN -> GPU -> CPU) Cooldown system to prevent alert spam Confidence filtering for stability
Challenges we ran into
Avoiding alert overload (UX challenge) Initial versions produced too many alerts: Bottle detected, laptop detected Problem: Users get overwhelmed Important alerts get buried System becomes unusable Solution: We redesigned the system into a priority-based alert engine: LOW -> harmless objects MEDIUM -> obstacles HIGH -> immediate danger
From detection -> decision-making (core difficulty) Most AI models only provide: Object labels + bounding boxes But we needed: Context + urgency + motion understanding Problem: No built-in notion of “danger” No understanding of motion like “approaching” Solution: We built a custom fusion engine that: derives position (left/center/right) estimates distance (near/medium/far) tracks temporal changes across frames detects motion patterns like: far -> medium -> near = approaching
Accomplishments that we're proud of
Built a fully working real-time system, not just a prototype Achieved multimodal fusion (vision + audio + motion) Implemented priority-based intelligence, not just detection Designed for actual visually impaired usability, not demo-only Created a system that is:
- fast
- stable
- explainable
Most importantly: We moved from AI that sees -> AI that understands
What we learned
What's next for SightSense
We see SightSense evolving into a full situational intelligence platform:
Near-term YAMNet-based sound classification (sirens, horns) Multi-object tracking with IDs Auto Danger Mode (context-aware sensitivity) Mid-term Personalized risk models Navigation assistance Environment classification (indoor/outdoor) Long-term Integration with wearables (glasses, earbuds) Fully context-aware AI assistant Large-scale deployment for accessibility
This is the github release - https://github.com/sravan1023/Qualcomm-Google-Hackathon/releases/tag/v0.1.0
Built With
- android
- camerax
- kotlin
- litert
- tensorflowlitestyle
- texttospeech
Log in or sign up for Devpost to join the conversation.