Inspiration

Voice Canvas was a tool created to empower people who may not have fine motor control to express themselves visually through voice alone. By combining real-time speech recognition and pitch detection, we imagined a new way of art built entirely around audio.

What it does

Voice Canvas enables users to create digital art using nothing but their voice. Users can:

  • Draw using sustained vocal pitch (like humming or singing)
  • Change brush colors or size with spoken commands (commands available on github)
  • Navigate the interface only requiring one's voice and the spacebar (although there is functionality for using a mouse too)

How we built it

  • Frontend: React + Vite simply because we have the most experience using this framework for building websites
  • Voice Commands: Transcribed in real-time using OpenAI’s Whisper API
  • Pitch-Based Drawing: Implemented with pitchy, a lightweight pitch detection library
  • Server: A minimal Express backend to proxy API calls during local testing
  • Deployment: Hosted live via Vercel for easy access and sharing

Challenges we ran into

  • Separating pitch-based drawing from spoken commands required careful design to avoid overlap or interference.
  • Latency from Whisper transcription had to be managed to feel smooth and real-time.
  • Mic permissions and audio context limitations in the browser required us to think through fallback and initialization strategies.
  • Creating a natural-feeling art experience with sound was non-trivial and involved fine-tuning input sensitivity and behavior mapping.

Accomplishments that we're proud of

  • Building a fully functional, voice-only art interface
  • Rainbow mode which draws over itself to create more colors (sort of like a spectogram)
  • Supporting real-time drawing through pitch detection
  • Deploying a working demo within a limited time
  • Relatively Mobile Friendly

What we learned

  • How to work with real-time audio streams and browser-based audio APIs
  • Practical usage of the Whisper API for command interpretation
  • Techniques to improve accessibility and voice-first design
  • How to reconcile dual input modes (speech and pitch) in a single interface

What's next for Voice Canvas

  • Complete Voice only UI requires some sort of VAD or keyword detection model
  • Multilingual support for non-English commands
  • Voice-based shape tools (e.g., "draw circle," "make a spiral")
  • Save/share features for artwork created in the app
  • Gallery mode for users to view and save creations
  • Advanced pitch tools like pitch-based brush effects or filters

Built With

Share this project:

Updates