Inspiration
What it does
How we built it
Challenges we ran into
Accomplishments that we're proud of
rapid-agent
A small Gemini-powered research-brief agent with the four pieces of governance you actually want in production: typed output, a budget cap, an egress allowlist, and a per-call trace.
The problem
Most agent demos look great in a notebook and fall over in production. The same failure modes repeat:
- Model returns prose, downstream JSON parse throws, retry loop runs at full cost, bill spikes.
- Tool fetches a URL with a typo, hits an internal host, leaks a header.
- One call costs as much as two hundred others. Nobody saw it. The audit trail is a wall of stdout.
These are what break the second a real agent ships.
The approach
rapid-agent is a small Python project that wires a Gemini call into a
useful task and surrounds it with four governance layers. The task is a
research-brief agent. Give it a topic and a list of URLs. It fetches
each, sends one prompt to Gemini, and returns a typed Brief object
you can index into. The loop runs in under 90 seconds and the code
fits in roughly 600 lines.
The four layers are not magic. Each is a small primitive that lives on its own. You can pull them apart and use them in any agent.
The four governance layers
Structured output enforcement (cast_json). The model is told to
return JSON. If it returns something else, the cast function builds a
repair prompt that includes the schema and the error, and calls the
model exactly once more. If repair still fails, the agent raises a
typed error instead of returning garbage.
Budget cap (BudgetCap). Before each call, the agent projects a
cost based on prompt length and reserves it against a USD cap. If the
projection would overshoot, the call never happens. After the call,
the actual cost is committed. The cap is shared across the run, so the
second call cannot push the total past the limit.
Egress allowlist (EgressAllowlist). The fetcher checks every URL
against a list of allowed hosts before opening a socket. Exact match or
any subdomain is allowed. Anything else raises EgressDenied and the
URL is skipped, with the denial recorded in the trace.
Per-call trace (Trace). Every fetch and every model call writes
an event with start time, duration, input tokens, output tokens, and
USD cost. The trace serializes to JSON. In production you pipe it to
Cloud Logging by printing it, or push cost to Cloud Monitoring as a
custom metric.
Demo
The demo runs two scenes back to back. Same task in both. First without governance, then with.
In scene one, an internal admin URL gets requested with no allowlist check. Cost climbs across calls with nothing to stop it. The model returns prose and any downstream code has to grep it. There is no record of what just happened.
In scene two, the same task runs through RapidAgent. The bad URL is
blocked before the socket opens. An oversized budget request is
refused. The model output comes back as a typed Brief with three
items, each with a title, summary, and key points. The trace shows
four events, total cost, total time, and how much budget is left. A
JSON trace file is written for later inspection.
Deploy story
The demo runs locally against the free-tier Gemini API. For production
you swap the client for a Vertex AI client. The four governance layers
do not change. DEPLOY.md in the repo has the full walkthrough: a
20-line vertex_client.py that drops in for GeminiClient, a
Dockerfile, Cloud Run deploy commands with --no-traffic for safe
rollouts, IAM that grants only roles/aiplatform.user, and a
production checklist.
Two observability hooks are documented. Printing the trace as a JSON
line forwards into Cloud Logging. Pushing trace.total_usd as a custom
metric gives a daily spend chart per caller. No extra stack required.
## Arize Phoenix integration (partner MCP)
The per-call trace is exported to Arize Phoenix via OpenTelemetry.
phoenix_export.py converts each TraceEvent into an OTLP span and
forwards it to a Phoenix collector (local at port 6006 or remote via
PHOENIX_COLLECTOR_ENDPOINT). Install with:
pip install "rapid-agent[phoenix]"
Phoenix then shows every fetch and model call as a named span with latency, token counts, USD cost, and the egress-denied events — the same data the custom Trace already captures, now queryable in the Phoenix UI and storable in Phoenix's built-in dataset for evals.
Why it matters
The hackathon is about rapid agents on Google Cloud. Rapid does not have to mean fragile. The four governance layers add roughly 200 lines of code and get the agent ready for real production deployment. Same agent, same prompt, now with a cost cap, a domain allowlist, a typed output, and a trace.
Repository
github.com/MukundaKatta/rapid-agent
24 passing tests, runs in 0.2 seconds. The demo runs in 90 seconds with no key required. Tested on Python 3.9 through 3.11.
What is next
The Vertex deploy is documented but not run. Once a project is
configured, the steps in DEPLOY.md take about 20 minutes end to end. A
sandbox project with an alert on BudgetExceeded in Cloud Logging
would be the natural first deploy target. The same shape works for
tool-using agents (replace the URL fetch with a function call loop),
and the four governance layers carry over without changes.
What we learned
What's next for rapid-agent: production governance for Gemini
Built With
- arize-phoenix
- cloud-run
- gemini
- google-cloud
- opentelemetry
- python
- vertex-ai

Log in or sign up for Devpost to join the conversation.