Inspiration
Future would be autonomous, and the underestimated is science and medicine. Lab in the loop is the area where both computational and physical experiments complements each other in a loop. But managing data , infrastructure is a huge bottleneck and even bigger is doing it all safe!
What it does
Dry lab is a starting point to run lab in the loop, safely. It gives agent its own computer and file system on the cloud and all you need is SKILL.md. With Fivetran's MCP, agents manage all the stuff like connecting GCS to Big Query, running SQL and gives you a research paper with figures and citation. And even an optional endpoint(Fivetran's Activation) to get it validated in wet lab.
How we built it
Dry Lab is a thin ADK orchestrator on Gemini/Vertex, not a framework: a planner delegates to one sequential pipeline — @Pipeline → @Investigator ⇄ @Critic → @Proposer — and halts at three human gates.
Fivetran is the data edge the agent runs itself: the baked Fivetran MCP does the EL sync, a scheduled dbt‑on‑Fivetran job builds the BigQuery marts, and Activations reverse‑ETLs the approved experiment back out. Because the managed code path 500'd on session reuse and wouldn't coexist with our skill / BigQuery / search tools, analysis runs in a networkless in‑container Jupyter kernel — we reduce the ~7M‑row matrix in BigQuery first (stage_query → Parquet) so raw data and credentials never reach the model.
Every output is pinned to an immutable GCS object (generation + crc32c) with only a thin runs/evidence index in BigQuery — the artifacts are the record, not a database. It all ships as one same‑origin Cloud Run container — UI, ADK API, the MCP, and the kernels — traced with Cloud Trace.
Challenges we ran into
The managed code path fought the loop. Vertex's hosted interpreter 500'd on session reuse — and run → inspect → fix in one live kernel is the whole job — and VertexAiCodeExecutor wouldn't coexist with our SkillToolset + BigQuery + Scout tools (malformed run_code calls). So we merged analyst+scribe into one @Investigator and made execute_python a plain FunctionTool over a persistent in‑container kernel — trading a Google‑managed sandbox for our own container + rlimits + a kernel cap, and getting the stateful loop back.
Honesty kept surfacing as our own bugs. An early "clean" run was wrong, and the Benjamini‑Hochberg math gave it away: the model had pasted group stats inline and silently analyzed ~50 probes instead of 2,000, corrupting both the empirical‑Bayes moderation and the correction. The fix wasn't a better prompt — it was architecture: stage_query writes a Parquet the kernel reads, and hand‑typed data is forbidden.
Accomplishments that we're proud of
A tamper‑evident chain from a sentence to the bytes — number → code cell → BigQuery mart → Fivetran connector → GEO source → pinned GCS object (generation + crc32c). The auditability the category markets but rarely shows.
An agent that operates a real, governed Fivetran pipeline end to end — ingest, scheduled transform, reverse‑ETL activation — with a human gate on every write and nothing auto‑fired.
Honesty enforced by types, not vibes. Two evidence kinds that can't be mixed (computed = pin + cell, no citation; literature = URL, no checksum), so you literally cannot staple a PubMed link onto a log2FC; the @Critic gates the rest (grounded symbols, fixed thresholds, leakage).
A loop that closes and a critic that bites. On camera the first pass came back REVISE across six gates and the re‑run passed — and we shipped results we didn't dress up: 32 genes at fixed cutoffs, an enrichment null reported as a null, a biomarker ROC‑AUC of 0.70 instead of a leaked 0.95.
Breadth without schema. Eight real, replayable runs across four data types on one runtime — and adding the biomarker panel was one SKILL.md on the existing marts: no new table, connector, or endpoint.
What we learned
What's next for Dry Lab
Make the loop continuable. Durable sessions (Cloud SQL), saved runs, and follow‑up questions instead of a single turn — then let scientists bring their own SKILL.md into a runtime that still grounds and gates everything they run.
The horizon the ExperimentRequest is already shaped for — re‑ingest wet‑lab results through Fivetran and let the loop learn from what comes back. Still emitting a structured request; never driving the hardware.
Built With
- fivetran
- gemini
- google-bigquery


Log in or sign up for Devpost to join the conversation.