Animation Agent (Rive Navigator)

Inspiration

Animation is one of the most time-intensive parts of building interactive products. Rive has done an amazing job making that process more accessible with a powerful web-based editor, but there's still a gap between knowing what you want to create and knowing how to get there inside a professional tool. We wanted to close that gap with an AI agent that lives inside the editor, sees what the user sees, and can take action on their behalf. The real inspiration was the idea that animation workflows could be partially automated without sacrificing creative intent.

What it does

Rive Navigator is a Chrome Extension sidePanel copilot for the Rive animation editor. It operates in three modes:

Ask mode — answers questions about the current editor state using live screenshots
Collaborative mode — walks the user through tasks step by step with guided instructions
Agentic mode — takes direct control of the editor via Chrome DevTools Protocol, executing clicks, drags, keystrokes, and paste commands autonomously

It also includes a Nano Banana SVG Generator, a separate pipeline that turns a text prompt into concept art using Gemini Pro Image, traces it into vector format with vtracer, sanitizes the SVG for Rive compatibility, and pastes it directly into the editor for immediate animation.

The entire experience is voice-enabled with push-to-talk input via Web Speech API and sentence-pipelined Gemini TTS narration with subtitle overlays, so you can work hands-free without reading chat.

How we built it

Chrome Extension (Manifest V3) — sidePanel UI, screenshot capture, CDP action execution, voice input, and SVG clipboard paste
FastAPI backend — hosted on Google Cloud Run, handles chat, asset generation, vectorization, and TTS
Google ADK with Gemini models — Flash for fast chat and agentic actions, Pro for complex reasoning, Pro Image for asset generation, Flash TTS for narration
vtracer for raster-to-SVG tracing and a custom SVG sanitizer to strip unsupported features before Rive import
Local Rive documentation corpus (285 .mdx files) with section-level scoring and image references for grounded responses
Chrome DevTools Protocol for trusted browser input, dispatching real input through the debugger API instead of simulating events

Challenges we ran into

CDP on Flutter Web canvas — Rive's editor is built on Flutter, so standard DOM interactions don't work. Every click, drag, and keystroke had to go through CDP's Input domain with precise viewport coordinates and timing (400ms pauses for double-clicks, 10-step drag interpolation at 30ms intervals, 500ms hover waits).
LLM output reliability — getting Flash to produce consistently parseable ACTION and CURSOR tags required stripping prompt mass way down and using per-turn procedure cards instead of one massive system prompt. Small formatting mismatches would break the entire agentic loop.
SVG compatibility — Gemini Pro Image generates raster PNGs, and vtracer traces them into SVG, but Rive only supports a subset of SVG features. We had to build a sanitizer that strips unsupported elements while preserving visual fidelity.
Action safety — the agent executes real browser input, so we built guardrails: a 150-action hard cap per turn, settle time between actions, preview timeouts surfaced to the sidebar, and strict validation on every parsed action before execution.

Accomplishments that we're proud of

The agent can genuinely operate the Rive editor, not just describe what it sees, but click through menus, create artboards, rename layers, drag/paste objects, accurately create animation keyframes, etc
The Nano Banana SVG pipeline works end-to-end: prompt → image → trace → sanitize → clipboard → paste into Rive → screenshot → hand off to the animation agent
Voice narration with sentence pipelining feels natural. TTS starts speaking the first sentence while later sentences are still being generated
The documentation grounding system surfaces relevant Rive docs with section-level precision and visual references instead of dumping entire pages

What we learned

Smaller models like Flash improve dramatically when prompt mass is reduced and runtime context is more targeted. Less is genuinely more
UI agents need strict output validation and recovery handling. One bad format breaks the whole loop
Some core editor operations should always be available as built-in guidance rather than relying entirely on retrieval
Prompt-to-SVG generation is much more reliable when constrained to simple, flat, vector-friendly assets rather than trying to handle arbitrary complexity
Trusted browser input via CDP is essential. Simulated DOM events get swallowed by Flutter's rendering layer

What's next for Animation Agent (Rive Navigator)

Persistent memory — right now sessions are stateless. We want the agent to remember user preferences, project context, and past interactions across sessions
Multi-step task planning — breaking complex animation workflows into planned sequences the agent can execute and recover from
Expanded asset pipeline — supporting more complex SVG structures and eventually animated asset templates
Relationship tracking — adapting the agent's communication style and level of detail based on how experienced the user becomes over time
Collaborative multi-agent workflows — having specialized agents for different tasks (layout, animation, asset creation) that can hand off to each other

Built With

gemini
google-cloud-run
javascript
nano-banana

Updates

Devon Bulgin started this project — Mar 16, 2026 04:07 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.