Jia | Devpost

mb i made this with ai i had no time left

Inspiration

Jia was inspired by the daily friction visually impaired people face when navigating unfamiliar spaces, especially when they need quick, contextual help that goes beyond static screen readers. The goal was to build something that feels less like a tool and more like a calm companion that can notice changes, warn about hazards, and respond naturally in real time.

What it does

Jia is a voice-first AI visual assistant that uses live camera input plus conversational AI to:

Describe surroundings in natural language
Answer scene-aware questions
Proactively warn about potential hazards
Support interruptible voice interaction for hands-free use
Speak responses with a more natural browser voice and low latency

How we built it

We built Jia as a React + Vite web app with:

A camera pipeline for capturing live frames
A streaming chat endpoint to OpenAI for real-time responses
Browser speech recognition for low-latency voice input
Browser speech synthesis for instant voice output
A conversation state machine (listening → sending → thinking → speaking) to control UX flow and interruptions
A proactive monitor loop that checks scene changes and speaks when something important appears

Challenges we ran into

Managing race conditions between voice states (especially sending vs thinking vs listening)
Preventing accidental mic pickup during API transitions
Balancing responsiveness with stability under hackathon time pressure
Making TTS sound less robotic without adding delay
Keeping interruptibility intuitive while avoiding false triggers

Accomplishments that we're proud of

Built a fully voice-first, real-time accessibility assistant in a short timeframe
Added proactive scene awareness instead of only reactive Q&A
Improved speech output quality using better voice selection logic
Fixed critical interaction bugs quickly (including sending → listening flicker)
Shipped and pushed production-ready iterations rapidly during the event

What we learned

Voice UX depends more on state timing and transitions than on model quality alone
Tiny async ordering issues can break the entire conversational feel
Proactive behavior needs strict guardrails to avoid speaking at the wrong time
Browser-native speech tools are powerful when orchestrated carefully
Fast iteration + immediate user feedback is essential in assistive AI products

What's next for Jia

Add stronger hazard detection and path guidance prompts
Personalize voice style and verbosity per user preference
Add multilingual support and offline fallbacks
Improve proactive intelligence with better scene-diff logic
Add analytics and evaluation loops for safety/reliability
Deploy mobile-optimized PWA flows for daily real-world usage

Built With

Updates

Nivin :D started this project — Feb 22, 2026 10:57 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.