Inspiration
Animation is one of the most time-intensive parts of building interactive products. Rive has done an amazing job making that process more accessible with a powerful web-based editor, but there's still a gap between knowing what you want to create and knowing how to get there inside a professional tool. We wanted to close that gap with an AI agent that lives inside the editor, sees what the user sees, and can take action on their behalf. The real inspiration was the idea that animation workflows could be partially automated without sacrificing creative intent.
What it does
Rive Navigator is a Chrome Extension sidePanel copilot for the Rive animation editor. It operates in three modes:
- Ask mode — answers questions about the current editor state using live screenshots
- Collaborative mode — walks the user through tasks step by step with guided instructions
- Agentic mode — takes direct control of the editor via Chrome DevTools Protocol, executing clicks, drags, keystrokes, and paste commands autonomously
It also includes a Nano Banana SVG Generator, a separate pipeline that turns a text prompt into concept art using Gemini Pro Image, traces it into vector format with vtracer, sanitizes the SVG for Rive compatibility, and pastes it directly into the editor for immediate animation.
The entire experience is voice-enabled with push-to-talk input via Web Speech API and sentence-pipelined Gemini TTS narration with subtitle overlays, so you can work hands-free without reading chat.
How we built it
- Chrome Extension (Manifest V3) — sidePanel UI, screenshot capture, CDP action execution, voice input, and SVG clipboard paste
- FastAPI backend — hosted on Google Cloud Run, handles chat, asset generation, vectorization, and TTS
- Google ADK with Gemini models — Flash for fast chat and agentic actions, Pro for complex reasoning, Pro Image for asset generation, Flash TTS for narration
- vtracer for raster-to-SVG tracing and a custom SVG sanitizer to strip unsupported features before Rive import
- Local Rive documentation corpus (285 .mdx files) with section-level scoring and image references for grounded responses
- Chrome DevTools Protocol for trusted browser input, dispatching real input through the debugger API instead of simulating events
Challenges we ran into
- CDP on Flutter Web canvas — Rive's editor is built on Flutter, so standard DOM interactions don't work. Every click, drag, and keystroke had to go through CDP's Input domain with precise viewport coordinates and timing (400ms pauses for double-clicks, 10-step drag interpolation at 30ms intervals, 500ms hover waits).
- LLM output reliability — getting Flash to produce consistently parseable ACTION and CURSOR tags required stripping prompt mass way down and using per-turn procedure cards instead of one massive system prompt. Small formatting mismatches would break the entire agentic loop.
- SVG compatibility — Gemini Pro Image generates raster PNGs, and vtracer traces them into SVG, but Rive only supports a subset of SVG features. We had to build a sanitizer that strips unsupported elements while preserving visual fidelity.
- Action safety — the agent executes real browser input, so we built guardrails: a 150-action hard cap per turn, settle time between actions, preview timeouts surfaced to the sidebar, and strict validation on every parsed action before execution.
Accomplishments that we're proud of
- The agent can genuinely operate the Rive editor, not just describe what it sees, but click through menus, create artboards, rename layers, drag/paste objects, accurately create animation keyframes, etc
- The Nano Banana SVG pipeline works end-to-end: prompt → image → trace → sanitize → clipboard → paste into Rive → screenshot → hand off to the animation agent
- Voice narration with sentence pipelining feels natural. TTS starts speaking the first sentence while later sentences are still being generated
- The documentation grounding system surfaces relevant Rive docs with section-level precision and visual references instead of dumping entire pages
What we learned
- Smaller models like Flash improve dramatically when prompt mass is reduced and runtime context is more targeted. Less is genuinely more
- UI agents need strict output validation and recovery handling. One bad format breaks the whole loop
- Some core editor operations should always be available as built-in guidance rather than relying entirely on retrieval
- Prompt-to-SVG generation is much more reliable when constrained to simple, flat, vector-friendly assets rather than trying to handle arbitrary complexity
- Trusted browser input via CDP is essential. Simulated DOM events get swallowed by Flutter's rendering layer
What's next for Animation Agent (Rive Navigator)
- Persistent memory — right now sessions are stateless. We want the agent to remember user preferences, project context, and past interactions across sessions
- Multi-step task planning — breaking complex animation workflows into planned sequences the agent can execute and recover from
- Expanded asset pipeline — supporting more complex SVG structures and eventually animated asset templates
- Relationship tracking — adapting the agent's communication style and level of detail based on how experienced the user becomes over time
- Collaborative multi-agent workflows — having specialized agents for different tasks (layout, animation, asset creation) that can hand off to each other
Built With
- gemini
- google-cloud-run
- javascript
- nano-banana

Log in or sign up for Devpost to join the conversation.