Inspiration
Every AI agent today is invisible. You type a message, get text back, and the AI vanishes until next time. It has no face, no body language, no visual presence. We asked ourselves: what if an AI could have a body — one that moves, expresses emotion, and grows over time through your interactions?
We were inspired by how humans communicate — over 55% of communication is body language, not words. Yet AI has been stuck in a text-only paradigm for decades. Motif is our answer: give AI a living, animated body that evolves with every conversation.
What it does
Motif is a digital companion platform where users create unique AI characters with a single sentence. The character comes alive with AI-generated animations powered by Google Veo 3, and responds to conversations with semantically-matched body language in real-time.
Key features:
- One-sentence character creation: Describe your companion, and Imagen 4 generates candidate images to choose from
- Generative animation system: Veo 3 creates unique animations for each character state, producing expressive video clips that play directly in the companion's viewport
- Semantic animation routing: Every message is analyzed in real-time, and the character's animation automatically matches the conversation context — happy topics trigger celebration, serious topics trigger thoughtful poses
- Growth system: Characters start with 3 basic animations and unlock new visual motifs through conversation. Each motif is permanently added to their repertoire
- Long-term memory: Characters remember your preferences and build a persistent relationship across sessions
- Multi-character support: Create and manage multiple independent companions, each with their own animation library and personality
How we built it
Frontend: Vanilla JavaScript + Vite for a lightweight, responsive UI. Video elements with styled viewports display character animations seamlessly within the interface.
Backend: Python FastAPI server with WebSocket for real-time bidirectional communication.
AI Agent: Google ADK (Agent Development Kit) orchestrating a multi-agent system powered by Gemini 2.5 Flash for conversation understanding, task execution, and animation state management.
Character Generation: Google Imagen 4 generates character images from text descriptions, with automatic prompt translation and optimization.
Animation Generation: Google Veo 3 generates 8-second animation videos for each character state. The system uses carefully engineered prompts to maintain character consistency across different actions and emotions, creating unique animations that play directly in the companion's viewport.
Semantic Routing: Sentence-Transformers (BGE embedding model) creates vector representations of conversation context, matched against animation state descriptions in real-time to select the most appropriate visual response.
Infrastructure: Docker containerized, deployed on Google Cloud Run. Character data and animation metadata stored in Firestore, animation files persisted in Cloud Storage.
Challenges we ran into
- Animation consistency: Ensuring Veo 3 generates visually consistent characters across different animation states required extensive prompt engineering and a strict character description template
- Animation presentation: We explored green-screen chroma-key approaches for transparent overlay, but found that direct video playback with a styled viewport provided a cleaner, more reliable visual experience across different devices and browsers
- Seamless looping: Making animations loop smoothly required prompting Veo 3 for neutral start/end poses and implementing crossfade transitions on the frontend
- Latency management: Veo 3 generation takes time, so we built an asynchronous pipeline that plays existing animations while new ones generate in the background, creating a non-blocking user experience
- Semantic matching accuracy: Tuning the embedding-based animation routing to feel natural rather than random required iterative testing across diverse conversation topics
Accomplishments that we're proud of
- Created a genuinely new interaction paradigm — AI with a visible, evolving body — that doesn't exist anywhere in the market
- Built a complete growth system where characters start minimal and become richer through use, creating real emotional investment
- Achieved real-time semantic animation routing that feels natural and responsive, not scripted
- Delivered a fully functional MVP as a solo developer within the hackathon timeframe
- The entire pipeline from character creation to animated conversation works end-to-end on Google Cloud
What we learned
- Veo 3's video generation capability is far more powerful than expected — with the right prompting, it can maintain surprising character consistency across different actions
- Semantic embeddings are remarkably effective for animation state matching, often outperforming rule-based approaches
- The emotional impact of seeing an AI "move" is profound — even simple idle animations dramatically change how users perceive and engage with an AI agent
- ADK's multi-agent architecture made it much easier to separate concerns between conversation, animation management, and content generation
What's next for Motif
- Voice interaction: Integrate TTS/ASR for spoken conversation (architecture already supports this)
- 3D characters: Evolve from 2D animated video to real-time 3D rendered characters
- Emotion sensing: Camera-based facial expression recognition so characters react to your real mood
- Social features: Share characters, visit others' companions, character-to-character interactions
- Mobile app: Native iOS/Android experience for always-available AI companionship
- Creator economy: Enable users to publish, share, and monetize their character designs
Built With
- 2.5
- 3
- 4
- adk
- cloud
- fastapi
- firestore
- flash
- gemini
- imagen
- javascript
- python
- run
- sentence-transformers
- storage
- veo
- vite
- websocket
Log in or sign up for Devpost to join the conversation.