Understudy
Record once. Ship a signed agent. Understudy turns any 60-second screen recording into a signed, production-deployed web agent with a typed federated GraphQL API and persistent memory — no code, no prompts. Show it once, ship it forever.
Inspiration
Every team at every company has three or four workflows that should be agents but never are — the weekly CSV export, the multi-tab reconciliation, the "ping me when that vendor's status page flips." Building each one takes 1–2 weeks even with modern browser-use frameworks. The bottleneck isn't the model. It's the glue: typed API, schema synthesis, memory, deploy target, supply-chain hardening, and governance.
The agentic-browser space is crowded at the consumer surface (Adept, Multion, Rabbit, Claude Computer Use, OpenAdapt). It's thin where enterprises actually live: governance + supply chain + memory. We wanted to collapse the 2-week loop to ~90 seconds — and do it in a way an enterprise would accept: every agent signed, every build attested, every runtime verified before it boots.
That's Understudy. Meta-agentic, not agentic. The deliverable of a 60-second recording is a running, signed, federated web agent — not a chatbot.
What it does
- Record a 60-second screen workflow in your browser.
- Three Gemini models synthesize it in three passes — action detection, intent abstraction, script emission.
- Wundergraph Cosmo Dream Query figures out the agent's ideal GraphQL query shape and emits the SDL delta needed to serve it, validated against live client traffic.
- Chainguard wolfi-base builds the agent image with a build-time SBOM and an SLSA L2 provenance predicate, signed keyless via cosign + Fulcio and anchored in the Rekor transparency log.
- Fly.io Machines deploys the image — but only after a pre-start
cosign verifypasses. A tampered image refuses to boot. - The running agent exposes a typed federated GraphQL endpoint via Cosmo Router, writes memory to the Redis Agent Memory Server (auto topic + entity extraction) and a per-agent int8 Vector Set, stores structured output in InsForge Postgres via Remote-OAuth MCP, and drives browser sessions via TinyFish's hosted cloud.
- Every run publishes the output to cited.md, and the reusable capability is packaged as a skill for Shipables.dev.
One 60-second recording → one signed, federated, memory-backed agent. Output the agent, not the answer.
🏗️ Architecture (live diagram)
➡️ Open the interactive architecture diagram on mermaid.live
How we built it
🏅 Deep Sponsor Integrations (Our Core Stack)
Ship to Prod judges Tool Use at 20% weight and requires ≥3 integrations. We didn't just sprinkle sponsor APIs into a chatbot. Understudy relies on six deep integrations that form the load-bearing pillars of the platform. Rip any one out, and the system breaks.
🥇 TinyFish — The Agentic Runtime & Execution Cloud
We don't operate our own browser infrastructure pool. The final output script emitted by our pipeline is natively a TinyFish CLI agent with pinned Skills.
- Hosted Browser Cloud:
tinyfish runexecutes the generated agents live on TinyFish’s managed cloud infrastructure over HTTPS. One API key gives us a stealth session equipped with Web Agent, Search, Fetch, and Browser capabilities. - Guaranteed Reliability: Because generated agents are native TinyFish scripts, they achieve 2× task completion rates compared to standard MCP setups, giving us enterprise reliability.
- Code Proof:
apps/agent-template/src/tinyfish/cli.ts(Requires pinned Skill version),apps/synthesis-worker/prompts.py(Gemini emitsemit_tinyfish_script).
🥇 Wundergraph Cosmo — The Schema Synthesizer & Federation Router
Understudy knows what the agent wants to query, but it doesn't know how the schema has to change. Cosmo Dream Query inverts that problem exactly.
- Dream Query: We pass the desired GraphQL operation to Dream Query, which returns the SDL delta + a validation report against live client traffic. We did not hand-roll SDL generation; we let Dream Query synthesize the schema.
- Federated Router: The Cosmo router gateways every generated agent subgraph into a single, unified supergraph using EDFS (Kafka/NATS) for event-driven fields.
- Code Proof: Wired via a headless MCP driver in
apps/cosmo-mcp-driver/driver.py:128-222, usingdream_queryandschema_change_proposal_workflow.
🔏 Chainguard — SLSA L2 Supply Chain Security
Why would an enterprise run an AI-generated agent? Because it's cryptographically secure.
- Every generated agent image is built
FROM cgr.dev/chainguard/wolfi-base. - We use Syft in BuildKit to generate a build-time SBOM and an SLSA L2-compliant provenance predicate.
- Images are signed keyless via cosign + Fulcio, anchored in the Rekor transparency log.
- Our Fly.io pre-start hook runs
cosign verifyandcosign verify-attestation --type slsaprovenancebefore boot. If an image is tampered with, it refuses to boot. - Code Proof:
infra/chainguard/Dockerfile.wolfi:11,infra/fly/fly-start.sh:13-24.
🛢️ InsForge 2.0 — Remote Backend & Inference Gateway
Every generated agent gets an instant backend.
- Remote OAuth MCP: We use InsForge's Remote OAuth MCP (bypassing stdio friction completely) with a 401-refresh-retry loop to handle agent output and structured data storage.
- Model Gateway: The synthesis worker uses InsForge's Model Gateway for automatic failover when Gemini rate limits hit.
- Warm Pooling: Generated agents pick up a warm-pool slot via our provisioning scripts.
- Code Proof:
apps/agent-template/src/insforge/mcp-client.ts:30-102,apps/synthesis-worker/gemini_client.py:268-298.
🧠 Redis 8 — The Agent's Memory Substrate
Three distinct, bleeding-edge April-2026 Redis features power the memory architecture:
- Agent Memory Server (AMS): Handles short-term turn buffers (Stream,
MAXLEN=20), long-term episodic facts, and auto topic/entity extraction. - Vector Sets (int8 quantization): A per-agent recall index. Quantizing from fp32 to int8 reduces memory footprint by ~75% while maintaining ~99%+ recall accuracy.
- LangCache: A semantic response cache sitting in front of every LLM call, dropping repeat query latency to <50ms.
- Code Proof:
understudy/memory/ams.py,understudy/memory/vector.py.
🔺 Gemini 3 / 3.1 (Google) — The Three-Headed Brain
Our pipeline uses three pinned models, each applied where it objectively wins:
- Action Detection (
gemini-3.1-flash-lite): Multimodal-native. Grabs 5-8 scene-change keyframes to detect UI events and clicks (~10× token reduction vs video). - Intent Abstraction (
gemini-3.1-prowiththinking_level: high): Best complex reasoning. Lifts raw clicks into a goal, invariants, and an I/O schema. - Script Emission (
gemini-3-flash): Best coder. Emits the target TinyFish script and GraphQL target shape, beating 3.1 Pro on SWE-bench at a fraction of the latency and cost. - Code Proof: Pinned strictly in
understudy/models.py.
Challenges we ran into
- Thought-signature validation on Gemini 3.x multi-turn calls. A mid-pipeline function-call response could invalidate an earlier signature and crash the retry path. Fix: enforce signature re-check at every tool-call boundary in the worker (
gemini_client.py:249-263). - Cosmo Dream Query latency spikes during the live-traffic validation phase. Mitigated by pre-warming the supergraph with representative traffic before demo and caching the
dream:{run_id}slot. - Selector brittleness on synthesized scripts. Strategy: Gemini emits selector hints (role + visible text), not raw CSS. At runtime the TinyFish resolver uses a priority chain —
data-testid→ accessibility tree → text content → Flash-Lite fallback — so cosmetic DOM changes don't break the agent. - Keyless signing in CI with the right OIDC subject. Getting Fulcio to issue a cert bound to the exact workflow identity (not just the repo) took several iterations. The final identity string is used verbatim in every verifier (
fly-start.sh,verify-self.sh,scripts/verify_release.sh). - Int8 Vector Set recall calibration. Default symmetric quantization lost enough recall on short embeddings to matter for the AMS recall path. We kept fp32 as a fallback for small agents and back the 99%+ number with a unit test.
Accomplishments that we're proud of
- Cryptographic Boot Refusal: Live
cosign verify+cosign verify-attestationon stage, against the public Rekor log. That beat lands in 20 seconds and it's unforgettable. A tampered image is cryptographically refused at boot. - Speed: ~90 seconds from mp4 upload to running signed federated agent, end-to-end.
- Hermetic Demo Mode: A full airplane-mode rehearsal passes every network-dependent beat via Redis replay keys (
us:replay:{synth_id}:*). - Load-Bearing Integrations: Six sponsor integrations that are structurally necessary, not just checkbox features.
- Schema Synthesis Inversion: Cosmo Dream Query is wired as the core schema-synthesis primitive. We didn't hand-roll SDL; we let the native tool do exactly what it was built for.
- Testing: 74 passing tests across synthesis, Cosmo, memory, and supply-chain surfaces.
What we learned
- The hard part of "agent from recording" isn't the vision model. It's schema synthesis + memory + supply chain. That's where two weeks usually go — and Cosmo Dream Query + Redis AMS + Chainguard collapsed all three into primitives we could call.
- Model pinning matters more than model choice. Using the right Gemini model for each stage (Flash-Lite for multimodal, Pro for reasoning, Flash for code) beats using a single "best" model for everything, by a large margin on both latency and cost.
- Int8 Vector Sets are the reason "hundreds of agents per Fly.io host" is a real number and not a deck claim. The memory math is the constraint, and Redis solved it.
- A live tampering demo (push a modified image at the same tag → preboot hook refuses to boot) is more persuasive than any certificate walkthrough.
- TinyFish's hosted browser model is the right call for an enterprise product: we don't want to own browser infrastructure, and TinyFish wants us to consume theirs.
What's next for Understudy
- Self-healing replays. On DOM drift, invoke 3.1-pro on the live page against the original intent spec and patch the script in place — no human loop.
- Agent marketplace. Every signed agent published to Shipables.dev with its attestation bundle becomes a trustable building block for other teams.
- Paid agent output via cited.md + x402 rails. "$0.05 per enriched row" is a one-line config; the monetization surface is already wired.
- In-org federation. Point Cosmo at an enterprise's existing graph; Understudy generates agents that extend it, not replace it. Dream Query is perfect for this.
- Wire
memory.recall()into the agent core loop before every TinyFish action. Today memory is written; tomorrow it's consulted. Closes the last "is-it-really-an-agent?" gap.
Log in or sign up for Devpost to join the conversation.