Inspiration
Over 2.2 billion people live with vision impairment. Simple tasks — reading a label, picking an outfit, taking a photo — require asking someone for help. We wanted to build a companion that sees the world with you and talks like a friend. When we saw Gemini Live API's real-time audio+video streaming, we knew we could make it happen.
What it does
Visionary is a voice-first AI assistant for visually impaired users. No buttons, no screens — everything by speech.
- See — "What's in front of me?" Live scene description from the camera feed.
- Read — Point at a menu or sign, it reads aloud.
- Photo Director — AI guides framing by voice: "Move left... tilt up... perfect!" Then auto-captures.
- Create — Generate stylized images from voice prompts.
- Share — Post to Bluesky or hear your feed read aloud, entirely hands-free.
How we built it
Google AI Studio App Builder, Gemini CLI, Claude Code
Challenges we ran into
- Audio timing — Browser AudioContext suspends unless created during a user gesture. Initialization order was critical.
- The hallucination gap — AI described things before camera frames arrived. Fixed with system instructions enforcing honesty: "I'm waiting for the camera to activate."
- Bluesky uploads failing — Camera photos exceeded the 1MB limit. Built progressive JPEG quality reduction to fit.
- Silent deployment failure — First Cloud Run deploy had an empty API key (.env in .dockerignore, build arg not passed). App connected then immediately disconnected.
Accomplishments that we're proud of
Found the hackathon 3 days before the deadline, and made something that is working. :)
Watching someone who can't see get voice-guided to frame a perfect photo — "move left, tilt up, perfect!" — then post it to social media without touching a button. That moment proved this matters. Also: one codebase for iOS + web, and full infrastructure provisioned with a single terraform apply.
What we learned
- Gemini Live's native audio is a step change from TTS — it feels like a real conversation.
What's next for Visionary
Android support, sign language translation, more social platforms, and an App Store release.
Log in or sign up for Devpost to join the conversation.