Live Art Coach

live art coach Architecture

Inspiration

As an artist and art educator, I have met many people who want to learn art but cannot access art education because of age, location, or financial limitations. I began to wonder whether an AI system could help provide basic guidance when a human teacher is not available. My goal was never to replace human teachers, but to explore whether AI could support the learning process while someone is actively drawing. This idea led me to create Live Art Coach.

What it does

Live Art Coach is a multimodal AI art coaching system that provides near real-time feedback using camera input and voice or text interaction. While drawing, users can point their camera at their artwork and ask questions through voice or text. The system observes the drawing process and provides short, practical guidance related to structure, proportion, or shading. My goal was to create a basic skills tutor that can offer helpful feedback at the moment someone is actively drawing.

How I built it

The backend is implemented in Python using FastAPI and deployed on Google Cloud Run. I integrated Gemini models through the Google GenAI SDK. In live mode, the browser streams camera frames, audio, and text through WebSockets while the backend maintains a Gemini Live session to generate context-aware feedback. The frontend is a lightweight HTML and JavaScript interface that uses browser camera streams, Web Audio APIs, and speech features to support multimodal interaction.

Challenges I ran into

One of the biggest challenges was scope. At the beginning, I had many ideas about what the system could do and spent time exploring different directions. Eventually I realized I needed to step back and focus only on the core functionality. Building a real-time multimodal system also introduced many unexpected technical challenges, including browser audio behavior, camera scene interpretation, and managing user interaction timing. There were moments when the process became exhausting, but simplifying the system and returning to the core idea helped bring the project together.

Accomplishments that I'm proud of

I successfully built a working prototype of a real-time multimodal art coaching agent. The system can observe drawings through a camera, process voice and text input, and provide contextual feedback while the user is actively drawing.

What I learned

Through this project I learned that building a useful AI agent is not only about model capability, but also about designing stable interactions between the user, the environment, and the system. Handling real-time multimodal input requires careful system design, especially when combining vision, audio, and conversational AI.

What's next for Live Art Coach

In the future, I would like to expand Live Art Coach into a more robust AI art education system. My long-term vision is to make art education more accessible so that anyone can receive guidance and develop creative confidence, regardless of their location or circumstances.

Built With

fastapi
google-cloud-run
google-gemini
google-genai-sdk
html
javascript
python
web-audio-api
websockets

Updates

Gyunghwa Roh started this project — Mar 16, 2026 01:46 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.