Inspiration
Voice Canvas was a tool created to empower people who may not have fine motor control to express themselves visually through voice alone. By combining real-time speech recognition and pitch detection, we imagined a new way of art built entirely around audio.
What it does
Voice Canvas enables users to create digital art using nothing but their voice. Users can:
- Draw using sustained vocal pitch (like humming or singing)
- Change brush colors or size with spoken commands (commands available on github)
- Navigate the interface only requiring one's voice and the spacebar (although there is functionality for using a mouse too)
How we built it
- Frontend: React + Vite simply because we have the most experience using this framework for building websites
- Voice Commands: Transcribed in real-time using OpenAI’s Whisper API
- Pitch-Based Drawing: Implemented with
pitchy, a lightweight pitch detection library - Server: A minimal Express backend to proxy API calls during local testing
- Deployment: Hosted live via Vercel for easy access and sharing
Challenges we ran into
- Separating pitch-based drawing from spoken commands required careful design to avoid overlap or interference.
- Latency from Whisper transcription had to be managed to feel smooth and real-time.
- Mic permissions and audio context limitations in the browser required us to think through fallback and initialization strategies.
- Creating a natural-feeling art experience with sound was non-trivial and involved fine-tuning input sensitivity and behavior mapping.
Accomplishments that we're proud of
- Building a fully functional, voice-only art interface
- Rainbow mode which draws over itself to create more colors (sort of like a spectogram)
- Supporting real-time drawing through pitch detection
- Deploying a working demo within a limited time
- Relatively Mobile Friendly
What we learned
- How to work with real-time audio streams and browser-based audio APIs
- Practical usage of the Whisper API for command interpretation
- Techniques to improve accessibility and voice-first design
- How to reconcile dual input modes (speech and pitch) in a single interface
What's next for Voice Canvas
- Complete Voice only UI requires some sort of VAD or keyword detection model
- Multilingual support for non-English commands
- Voice-based shape tools (e.g., "draw circle," "make a spiral")
- Save/share features for artwork created in the app
- Gallery mode for users to view and save creations
- Advanced pitch tools like pitch-based brush effects or filters
Built With
- express.js
- javascript
- node.js
- react
- vercel
- vite
- whisper
Log in or sign up for Devpost to join the conversation.