Motif — Give Your AI a Living Animated Body That Grows

Inspiration

Every AI agent today is invisible. You type a message, get text back, and the AI vanishes until next time. It has no face, no body language, no visual presence. We asked ourselves: what if an AI could have a body — one that moves, expresses emotion, and grows over time through your interactions?

We were inspired by how humans communicate — over 55% of communication is body language, not words. Yet AI has been stuck in a text-only paradigm for decades. Motif is our answer: give AI a living, animated body that evolves with every conversation.

What it does

Motif is a digital companion platform where users create unique AI characters with a single sentence. The character comes alive with AI-generated animations powered by Google Veo 3, and responds to conversations with semantically-matched body language in real-time.

Key features:

One-sentence character creation: Describe your companion, and Imagen 4 generates candidate images to choose from
Generative animation system: Veo 3 creates unique animations for each character state, producing expressive video clips that play directly in the companion's viewport
Semantic animation routing: Every message is analyzed in real-time, and the character's animation automatically matches the conversation context — happy topics trigger celebration, serious topics trigger thoughtful poses
Growth system: Characters start with 3 basic animations and unlock new visual motifs through conversation. Each motif is permanently added to their repertoire
Long-term memory: Characters remember your preferences and build a persistent relationship across sessions
Multi-character support: Create and manage multiple independent companions, each with their own animation library and personality

How we built it

Frontend: Vanilla JavaScript + Vite for a lightweight, responsive UI. Video elements with styled viewports display character animations seamlessly within the interface.

Backend: Python FastAPI server with WebSocket for real-time bidirectional communication.

AI Agent: Google ADK (Agent Development Kit) orchestrating a multi-agent system powered by Gemini 2.5 Flash for conversation understanding, task execution, and animation state management.

Character Generation: Google Imagen 4 generates character images from text descriptions, with automatic prompt translation and optimization.

Animation Generation: Google Veo 3 generates 8-second animation videos for each character state. The system uses carefully engineered prompts to maintain character consistency across different actions and emotions, creating unique animations that play directly in the companion's viewport.

Semantic Routing: Sentence-Transformers (BGE embedding model) creates vector representations of conversation context, matched against animation state descriptions in real-time to select the most appropriate visual response.

Infrastructure: Docker containerized, deployed on Google Cloud Run. Character data and animation metadata stored in Firestore, animation files persisted in Cloud Storage.

Challenges we ran into

Animation consistency: Ensuring Veo 3 generates visually consistent characters across different animation states required extensive prompt engineering and a strict character description template
Animation presentation: We explored green-screen chroma-key approaches for transparent overlay, but found that direct video playback with a styled viewport provided a cleaner, more reliable visual experience across different devices and browsers
Seamless looping: Making animations loop smoothly required prompting Veo 3 for neutral start/end poses and implementing crossfade transitions on the frontend
Latency management: Veo 3 generation takes time, so we built an asynchronous pipeline that plays existing animations while new ones generate in the background, creating a non-blocking user experience
Semantic matching accuracy: Tuning the embedding-based animation routing to feel natural rather than random required iterative testing across diverse conversation topics

Accomplishments that we're proud of

Created a genuinely new interaction paradigm — AI with a visible, evolving body — that doesn't exist anywhere in the market
Built a complete growth system where characters start minimal and become richer through use, creating real emotional investment
Achieved real-time semantic animation routing that feels natural and responsive, not scripted
Delivered a fully functional MVP as a solo developer within the hackathon timeframe
The entire pipeline from character creation to animated conversation works end-to-end on Google Cloud

What we learned

Veo 3's video generation capability is far more powerful than expected — with the right prompting, it can maintain surprising character consistency across different actions
Semantic embeddings are remarkably effective for animation state matching, often outperforming rule-based approaches
The emotional impact of seeing an AI "move" is profound — even simple idle animations dramatically change how users perceive and engage with an AI agent
ADK's multi-agent architecture made it much easier to separate concerns between conversation, animation management, and content generation

What's next for Motif

Voice interaction: Integrate TTS/ASR for spoken conversation (architecture already supports this)
3D characters: Evolve from 2D animated video to real-time 3D rendered characters
Emotion sensing: Camera-based facial expression recognition so characters react to your real mood
Social features: Share characters, visit others' companions, character-to-character interactions
Mobile app: Native iOS/Android experience for always-available AI companionship
Creator economy: Enable users to publish, share, and monetize their character designs

Built With

2.5
3
4
adk
cloud
fastapi
firestore
flash
gemini
google
imagen
javascript
python
run
sentence-transformers
storage
veo
vite
websocket

Updates

EZZEASY ZL started this project — Mar 16, 2026 11:18 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.