Hyperframes Agent

Inspiration

Short-form video is the dominant content format today, but producing even a 15-second clip requires juggling research, scripting, asset generation, voiceover, and video assembly. We wanted to see if we could compress that entire workflow into a single natural-language prompt — type a topic, get a video. The Arize AI Observability track gave us the perfect excuse to make the pipeline not just autonomous, but self-improving through traced evaluations.

What it does

HyperFrames Agent is a multi-agent video production pipeline built on Google ADK. Given a topic (e.g. "create a 15-second video about the solar system"), it:

Researches the topic using the model's training data
Proposes a structured video concept
Writes a narration script
Plans scenes with visual descriptions
Generates images via Google Imagen and narration via Cloud Text-to-Speech
Reviews and refines all content
Composes a HyperFrames HTML document with GSAP animations and renders it to MP4

After rendering, it evaluates its own output with an LLM-as-a-Judge and logs quality scores to Arize Phoenix — closing the loop between production and observability.

How we built it

Agent framework: Google ADK with a root orchestrator delegating to 7 specialist sub-agents (research, proposal, script, scene plan, assets, edit, compose)
Image generation: Google Imagen (imagen-3.0-generate-002) via the Vertex GenAI SDK
Audio: Google Cloud Text-to-Speech (en-US-Neural2-J)
Video rendering: HyperFrames v0.6.91 — generates HTML5 compositions with GSAP timelines and renders them via Chrome + FFmpeg
Observability: OpenInference auto-instrumentation for ADK and GenAI, sending traces to Arize Phoenix Cloud
Evaluations: LLM-as-a-Judge (same GenAI model) scores each pipeline run on correctness, completeness, and relevance; scores are logged back to Phoenix as span evaluations
Resilience: Exponential backoff with jitter on Vertex AI 429 rate limits, plus ADK's built-in retry config
MCP: Phoenix MCP server configured so agents can introspect their own traces at runtime

Every tool function is ToolContext-aware, routing all file outputs to session-scoped directories.

Challenges we ran into

HyperFrames HTML structure: The LLM could never generate valid HyperFrames compositions — it would omit data-composition-id, break GSAP timeline syntax, or use incorrect attribute names. We solved this by moving HTML generation server-side: generate_composition_html takes structured scene data and produces guaranteed-valid output.
Asset path resolution: HyperFrames' HTTP server is rooted at the project directory, so ../assets/ paths 404'd. Assets must be copied into the project's own assets/ subdirectory with relative paths.
429 rate limits: Vertex AI aggressively throttles. We layered two retry mechanisms — a decorator on individual tool calls and ADK's retry_config on every agent — with exponential backoff, jitter, and invocation-ID-based resumption.
Session isolation: Early versions used a single env var for session paths, which broke under concurrent ADK sessions. Refactoring every tool to accept ToolContext fixed this.
ADK web vs CLI: The pipeline auto-progresses in CLI mode, but the ADK web UI required careful instruction design so agents wouldn't pause asking the user for confirmation at each step.

Accomplishments that we're proud of

End-to-end autonomy: A single prompt produces a complete video — no human in the loop between topic and MP4.
Observability loop: The pipeline doesn't just emit traces; it reads them back to evaluate its own quality and inform future runs.
Resilient design: The 429 retry with invocation-ID resumption means long pipelines survive rate-limit storms without restarting.
Clean architecture: 7 specialist agents, each with a focused skill and tool set, orchestrated by a lightweight root agent. Adding or swapping agents is straightforward.
Arize track coverage: All six requirements met — code-owned runtime (ADK), OpenInference instrumentation, Phoenix Cloud traces, MCP introspection, LLM-as-a-Judge evaluations, and the bonus self-improvement via trace queries.

What we learned

Auto-instrumentation is magic: phoenix.otel.register(auto_instrument=True) instruments ADK, GenAI, and HTTP calls with zero manual span creation.
LLMs can't generate valid HyperFrames HTML reliably: The composition DSL (GSAP timelines, data-composition-id, scene attributes) is too structured for freeform generation. A server-side builder function with validated output is the right approach.
Retry is a systems problem, not just a code problem: Decorators catch individual failures, but ADK's retry_config + run_async with invocation_id handle agent-level failures that span multiple tool calls and sub-agent transfers.
Session isolation matters early: Refactoring from env-var-based to context-based session paths midway was more painful than doing it right upfront.

What's next for HyperFrames Agent

Multi-modal evaluation: Score outputs on visual quality (image coherence, animation smoothness) using a vision-language model judge.
A/B testing via Phoenix experiments: Run multiple prompt/parameter variants and compare trace-grounded evaluation scores.
Persistent learning: Store evaluation results in a dataset and fine-tune agent instructions based on what scores highest.
Voice customization: Support multiple TTS voices and languages.
Batch production: Queue multiple topics and produce a playlist of videos.
Web UI: A dashboard to view past runs, trace visualizations, and evaluation scores — all powered by Phoenix queries.

Built With

python

Updates

Rajesh Saravanan started this project — Jun 11, 2026 02:08 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.