Inspiration
Over 1 billion people worldwide live with disabilities that make traditional keyboard-and-mouse interfaces difficult or impossible to use. We asked: What if anyone could create stunning AI art using just their hands and voice?
Muse was born from the belief that creative expression should be accessible to everyone — regardless of physical ability.
What it does
Muse is a hands-free AI art studio that turns hand gestures and voice commands into stunning images and videos, powered entirely by Google's Gemini API ecosystem.
Core Interaction Model
- ✊ Fist → Start voice input (describe what you want)
- 👌 OK → Generate AI image from your description
- ✌️ Peace → Generate video from the image
- 🖐️ Open Palm → Get creative inspiration
- 🤙 Shaka → Start real-time voice conversation with AI
Gemini Models Used
| Model | Purpose |
|---|---|
| Gemini 3 Flash Preview | AI Art Director — refines prompts with streaming thought chains |
| Gemini 2.5 Flash Image | Image generation from refined prompts |
| Veo 3.1 Fast | Video generation from images with motion |
| Gemini 2.5 Flash Native Audio | Real-time bidirectional voice conversations (Live API) |
| Gemini 2.0 Flash | Social copy generation & image analysis |
Accessibility Features
- 8 hand gesture controls via camera
- 15+ voice commands in 7 languages
- Full keyboard shortcut support
- Audio announcements for screen readers
- Haptic feedback on mobile
- Button text labels toggle
How we built it
Frontend-only architecture — no backend server needed:
- React + TypeScript + Vite for the UI
- @google/genai SDK for ALL AI interactions (no other AI providers)
- MediaPipe Holistic (WASM) for real-time hand & face tracking in the browser
- Web Speech API for voice input
- Tailwind CSS for responsive design across mobile/tablet/desktop
- Firebase Auth for user authentication
- Web Audio API for generative ambient music during loading
The entire AI pipeline runs client-side → Gemini API, with no middleware.
Challenges we ran into
- Gesture reliability — Hand gesture detection needed careful tuning of thresholds and a persistence filter (3 consecutive frames) to avoid false positives
- Fist→OK transition — When switching from recording (fist) to generating (OK), we had to implement
stopAndCapture()to ensure the voice transcript was fully delivered before triggering generation - API overload handling — Gemini 503/UNAVAILABLE errors during streaming required wrapping the entire stream read (not just connection) in retry logic with exponential backoff
- Accessible UX — Balancing a visually rich interface with true accessibility required aria-labels on every button, keyboard shortcuts for all actions, and i18n across 7 languages
Accomplishments that we're proud of
- Zero-keyboard creative workflow: Users can go from idea to AI-generated art to video using only hand gestures and voice
- 6 Gemini models integrated into a single coherent experience
- Real-time AR effects on the camera feed during generation (constellation particles, energy rings, pulsing vignette)
- Generative ambient music using Web Audio API synthesis — evolving chord pads, random melodic notes, and filtered noise textures that change per generation stage
What we learned
- MediaPipe's WASM-based hand tracking is remarkably accurate but requires careful frame-rate management to avoid UI jank
- The Gemini Live API (
bidiGenerateContent) requires dated model variants — non-dated aliases don't work - React state updaters inside async chains can cause timing bugs — using refs (
studioStateRef) for cross-async-boundary state reads solved this - Web Audio API can create surprisingly musical ambient soundscapes with just oscillators, filters, and delay nodes
What's next for Muse
- 3D Model Generation and Genie 3 World Generation (UI ready, awaiting API access)
- Personalized AI Memory for Pro users — Muse remembers your creative style
- Cloud deployment with ephemeral tokens for secure client-side Gemini access
- Community gallery for sharing creations
Built With
- firebase-auth
- gemini-2.5-flash-image
- gemini-3-flash
- gemini-live-api
- google-genai-sdk
- mediapipe
- react
- tailwind-css
- typescript
- veo-3.1
- vite
- web-audio-api
- web-speech-api
Log in or sign up for Devpost to join the conversation.