mb i made this with ai i had no time left
Inspiration
Jia was inspired by the daily friction visually impaired people face when navigating unfamiliar spaces, especially when they need quick, contextual help that goes beyond static screen readers. The goal was to build something that feels less like a tool and more like a calm companion that can notice changes, warn about hazards, and respond naturally in real time.
What it does
Jia is a voice-first AI visual assistant that uses live camera input plus conversational AI to:
- Describe surroundings in natural language
- Answer scene-aware questions
- Proactively warn about potential hazards
- Support interruptible voice interaction for hands-free use
- Speak responses with a more natural browser voice and low latency
How we built it
We built Jia as a React + Vite web app with:
- A camera pipeline for capturing live frames
- A streaming chat endpoint to OpenAI for real-time responses
- Browser speech recognition for low-latency voice input
- Browser speech synthesis for instant voice output
- A conversation state machine (
listening → sending → thinking → speaking) to control UX flow and interruptions - A proactive monitor loop that checks scene changes and speaks when something important appears
Challenges we ran into
- Managing race conditions between voice states (especially
sendingvsthinkingvslistening) - Preventing accidental mic pickup during API transitions
- Balancing responsiveness with stability under hackathon time pressure
- Making TTS sound less robotic without adding delay
- Keeping interruptibility intuitive while avoiding false triggers
Accomplishments that we're proud of
- Built a fully voice-first, real-time accessibility assistant in a short timeframe
- Added proactive scene awareness instead of only reactive Q&A
- Improved speech output quality using better voice selection logic
- Fixed critical interaction bugs quickly (including
sending → listeningflicker) - Shipped and pushed production-ready iterations rapidly during the event
What we learned
- Voice UX depends more on state timing and transitions than on model quality alone
- Tiny async ordering issues can break the entire conversational feel
- Proactive behavior needs strict guardrails to avoid speaking at the wrong time
- Browser-native speech tools are powerful when orchestrated carefully
- Fast iteration + immediate user feedback is essential in assistive AI products
What's next for Jia
- Add stronger hazard detection and path guidance prompts
- Personalize voice style and verbosity per user preference
- Add multilingual support and offline fallbacks
- Improve proactive intelligence with better scene-diff logic
- Add analytics and evaluation loops for safety/reliability
- Deploy mobile-optimized PWA flows for daily real-world usage
Log in or sign up for Devpost to join the conversation.