Inspiration

While working on a 2D clipart animation research project at the University of Edinburgh's Graphics Lab, I was blown away by nano-banana - currently the most powerful image generation model - which can take a clipart and a stick figure to generate that clipart in a brand-new pose. This sparked a vision: what if anyone could sketch freely in mid-air and have AI instantly understand and transform it into professional digital art?

That’s how AirCanvas AI was born. I wanted to build an intuitive, keyboard-free creative tool that blends real-time gesture interaction with cutting-edge generative AI, making digital art creation as natural as waving your hand.


What It Does

AirCanvas AI is an interactive air-drawing system powered by your webcam and hand gestures:

  • Draw in mid-air using your index finger as a virtual brush
  • Switch modes with a fist hold (DRAW ↔ AI)
  • In AI mode:
    • Wave left → Get AI-generated drawing inspiration
    • Wave right → AI describes your sketch and generates a high-quality digital artwork using Stable Diffusion
  • Fully gesture-controlled - no keyboard, no mouse
  • Auto-saves raw sketches and AI-generated art

It turns a 10-second doodle into a professional-grade illustration - all through the magic of hand gestures and AI.


How I Built It

I built this solo during Durhack using Python and a modular architecture:

Component Technology
Hand Tracking MediaPipe Hands
Gesture Recognition Custom logic (fist, finger count, swipe detection)
Canvas & Overlay OpenCV + NumPy
AI Assistant Google Gemini API (vision + text)
Image Generation Stable Diffusion via ModelsLab API
File Handling Timestamped auto-save system

Evolution of the Project:

  1. v1: Keyboard controls (s, a, d, q), clunky
  2. v2: Full gesture-only control, more immersive
  3. v3: Split into DRAW and AI modes to avoid gesture conflicts
  4. Final: Removed Neural Style Transfer → replaced with real image generation (more impactful)

The core loop runs in main.py, with clean separation of concerns across /modules/ and /utils/.


Challenges I Ran Into

  1. OpenCV Threading Deadlocks
    When the OpenCV window had focus, it blocked the main thread, freezing gesture detection and AI triggers. Solved by careful timing, non-blocking checks, and avoiding heavy operations in the main loop.

  2. Solo Development Under Time Pressure
    My teammate dropped out the morning of the hackathon. I had to replan the entire timeline, prioritize MVP, and ruthlessly cut features to deliver a polished experience.

  3. API Rate Limits
    Gemini and ModelsLab APIs hit rate limits quickly during testing. I reduced gesture sensitivity, added cooldowns, and used lightweight prompts to stay under quotas.


Accomplishments That I'm Proud Of

  • 100% gesture-controlled interface - no keyboard, no training needed
  • Seamless AI integration: from sketch → description → professional artwork in <30s
  • Robust hand tracking in varied lighting (thanks to MediaPipe)
  • Auto-save pipeline with clean folder structure (raw/ + generated/)
  • Completed solo in <48 hours after teammate dropout

What I Learned

  • Gesture UX design: Small delays and visual feedback are critical for natural interaction
  • API resilience: Always assume rate limits and build fallbacks
  • Modular code saves lives in hackathons
  • AI prompt engineering: Shorter, focused prompts = better, faster results
  • OpenCV + threading = danger zone - use threading wisely

What's Next for AirCanvas AI

  • [ ] Local Stable Diffusion (no API, offline use)
  • [ ] Voice feedback via ElevenLabs ("Great job! I see a dragon!")
  • [ ] Multi-hand support (collaborative drawing)
  • [ ] Undo/redo with gesture (pinch to undo)
  • [ ] Animated GIF export of drawing process
  • [ ] Web version using WebRTC + MediaPipe
  • [ ] Clipart pose transfer (like nano-banana) — turn your sketch into animated characters

Built with passion, gestures, and a lot of coffee.
Durhack 2025 | Solo Developer

Built With

Share this project:

Updates