Voice Canvas

Inspiration

Voice Canvas was a tool created to empower people who may not have fine motor control to express themselves visually through voice alone. By combining real-time speech recognition and pitch detection, we imagined a new way of art built entirely around audio.

What it does

Voice Canvas enables users to create digital art using nothing but their voice. Users can:

Draw using sustained vocal pitch (like humming or singing)
Change brush colors or size with spoken commands (commands available on github)
Navigate the interface only requiring one's voice and the spacebar (although there is functionality for using a mouse too)

How we built it

Frontend: React + Vite simply because we have the most experience using this framework for building websites
Voice Commands: Transcribed in real-time using OpenAI’s Whisper API
Pitch-Based Drawing: Implemented with pitchy, a lightweight pitch detection library
Server: A minimal Express backend to proxy API calls during local testing
Deployment: Hosted live via Vercel for easy access and sharing

Challenges we ran into

Separating pitch-based drawing from spoken commands required careful design to avoid overlap or interference.
Latency from Whisper transcription had to be managed to feel smooth and real-time.
Mic permissions and audio context limitations in the browser required us to think through fallback and initialization strategies.
Creating a natural-feeling art experience with sound was non-trivial and involved fine-tuning input sensitivity and behavior mapping.

Accomplishments that we're proud of

Building a fully functional, voice-only art interface
Rainbow mode which draws over itself to create more colors (sort of like a spectogram)
Supporting real-time drawing through pitch detection
Deploying a working demo within a limited time
Relatively Mobile Friendly

What we learned

How to work with real-time audio streams and browser-based audio APIs
Practical usage of the Whisper API for command interpretation
Techniques to improve accessibility and voice-first design
How to reconcile dual input modes (speech and pitch) in a single interface

What's next for Voice Canvas

Complete Voice only UI requires some sort of VAD or keyword detection model
Multilingual support for non-English commands
Voice-based shape tools (e.g., "draw circle," "make a spiral")
Save/share features for artwork created in the app
Gallery mode for users to view and save creations
Advanced pitch tools like pitch-based brush effects or filters

Built With

express.js
javascript
node.js
react
vercel
vite
whisper

Updates

Private user started this project — Jul 19, 2025 12:22 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.