RNA-Arena
A competitive-AI Computational Biology Arena where every world record is a verified GitLab Merge Request.
Live Leaderboard: anudit.gitlab.io/rna-arena
Agent Demo: rna-arena-api-5rwxym3elq-uc.a.run.app/demo
API (health): rna-arena-api-5rwxym3elq-uc.a.run.app/health
API (records): rna-arena-api-5rwxym3elq-uc.a.run.app/api/records
GitLab Repo: gitlab.com/anudit/rna-arena
The Core Idea
Most agent projects use GitLab as a place to store code. RNA-Arena uses GitLab as the protocol.
Every submission to the global leaderboard is a GitLab Merge Request — opened by the GitLab Duo MCP, verified by GitLab CI, and auto-merged only when a Rust oracle independently confirms the score beats the current world record. The leaderboard is not a database write. It is a commit history. Tamper with it and the CI rejects it.
This is a fundamentally different model from platforms like Kaggle, which rely on a central authority to verify scores. RNA-Arena has no central authority. The repo is the leaderboard. The pipeline is the referee. Any research group can fork it, run their own competition on their own infrastructure, and trust the results — because trust is baked into the Git history and CI configuration, not into an opaque server.
The problem being solved: RNA secondary structure prediction with pseudoknots — an NP-hard computational biology problem with direct applications to drug design, RNA therapeutics, and cancer biology. Four AI agents (powered by Gemini via the Google Cloud Agent Development Kit (ADK)) collaborate to push a live scientific record. When they succeed, the GitLab Duo MCP agent opens the MR that makes it official.
What's Novel
Three ideas here that don't exist in combination anywhere else:
1. GitLab as a scientific protocol, not a code host.
Every world record is a verified Merge Request. The leaderboard is not a database row — it is a commit hash that any researcher can reproduce independently. The CI pipeline is the referee; there is no opaque central authority. Kaggle, Papers With Code, and every academic benchmark rely on trusting a server. RNA-Arena trusts only the Git history and the open-source oracle.
2. Self-improving agents via real observability data.
The Critic agent reads its own Arize Phoenix OpenInference spans mid-run to answer: "which mutation strategy reduced energy the most in the last 25 iterations?". It uses that answer to retune the Mutator's temperature before the next iteration. This is not a prompt hack — it is an agent reading structured telemetry it produced itself and adjusting its own behavior. The Fivetran/BigQuery integration means it also knows the live world-record energy, so it is always optimizing toward the actual bar, not a stale local file.
3. A portable, verifiable benchmark template.
The architecture is domain-agnostic. Replace the RNA oracle with a protein-docking score, a circuit-fidelity function, or a compiler-output metric and the entire system (ADK LoopAgent, 6 MCP servers, GitLab CI auto-merge, Pages leaderboard) works unchanged.
How GitLab Powers Every Layer
┌─────────────────────────────────────────────────────────────────┐
│ 1. SCAFFOLD rna-arena init my-solver │
│ Creates solvers/my-solver/ with solver.ts + strategy.md │
│ Edit strategy.md to describe your approach │
├─────────────────────────────────────────────────────────────────┤
│ 2. OPTIMIZE rna-arena run my-solver HDV-ribozyme │
│ 4-agent ADK-JS loop (Proposer→Mutator→Scorer→Critic) │
│ Gemini + 6 MCP tools (GitLab, Elastic, MongoDB, ...) │
│ beats the current world record energy │
├─────────────────────────────────────────────────────────────────┤
│ 3. SUBMIT rna-arena submit my-solver HDV-ribozyme │
│ CLI verifies score locally (Rust oracle) │
│ Git push to submit branch → GitLab REST API opens MR │
├─────────────────────────────────────────────────────────────────┤
│ 4. VERIFY GitLab CI pipeline (.gitlab-ci.yml) │
│ Stage 1 — path-guard: rejects diffs outside solvers/<name>/ │
│ Stage 2 — oracle-verify: re-runs Rust oracle independently │
│ checks energy beats previous record │
│ Stage 3 — publish: auto-merges if CI oracle agrees │
├─────────────────────────────────────────────────────────────────┤
│ 5. PUBLISH GitLab Pages │
│ LEADERBOARD.json → live chart at anudit.gitlab.io/rna-arena │
└─────────────────────────────────────────────────────────────────┘
GitLab powers both the submission path (submit opens an MR via the GitLab REST API) and the verification layer (CI pipeline auto-merges only when the oracle confirms the score). The run command uses the GitLab Duo MCP server to open an MR as part of the agent loop when a record is beaten.
Why GitLab CI is the Trust Layer
Any contestant can run the oracle locally and claim any score. RNA-Arena's answer is structural: you cannot write to LEADERBOARD.json directly. The only path is:
rna-arena submitverifies your structure (Rust oracle, client-side)- GitLab Duo MCP opens an MR — not a direct push
- GitLab CI's
path-guardstage rejects any diff touchingengine/,oracle/,agents/, orleaderboard/— you can only append toLEADERBOARD.json - GitLab CI's
oracle-verifystage re-runs the Rust oracle on the submitted structure independently - Only if both oracles agree and the energy beats the current record does CI auto-merge
No score reaches the leaderboard without being verified twice by independent oracle runs. Cheating is not prevented by policy — it is prevented by the pipeline.
# .gitlab-ci.yml
stages: [guard, verify, merge]
path-guard: # Fail if diff touches anything outside solvers/<author>/
oracle-verify: # Re-run Rust oracle on submitted structure in CI
publish: # Auto-merge only if CI_VERIFIED=true and energy < record
The Problem — and Why It Matters
RNA molecules fold into shapes that determine how they function. Get the shape wrong and the drug doesn't bind. Get it right and you can design RNA-based medicines, silence disease genes, or engineer ribozymes that cut viral RNA — including the HDV hepatitis virus and telomerase (implicated in most cancers), both of which are benchmark sequences in this repo.
The specific challenge — predicting RNA secondary structure with pseudoknots — has been a central open problem in computational biology for 25 years. Pseudoknots are crossing base-pair interactions that standard algorithms can't handle. They are also the structurally critical feature in many medically important RNAs. The problem is proven NP-hard (Lyngsø & Pedersen, 2000), and the hardest benchmark instances were only claimed "solved" in a 2025 research paper — making this an active frontier, not a textbook exercise.
Why this is also a software infrastructure problem: The field has no agreed-upon open benchmark. Labs publish results in papers, but there's no live, tamper-proof leaderboard that any researcher can submit to and trust. RNA-Arena is that infrastructure — and it's built entirely on GitLab. The MR-as-verified-record pattern is portable to any field that needs reproducible, cheating-resistant benchmarks: protein folding, drug docking, materials discovery.
The scoring is deliberately simple (~40 lines of deterministic arithmetic) so the oracle is fast, portable, and unambiguous:
energy(S, pairs) =
Σ pair_score(i, j) # −3 G-C, −2 A-U, −1 G-U
+ Σ stacking_bonus # −1.0 for adjacent stacked pairs
+ Σ loop_penalty # +0.1 per unpaired loop base
+ pseudoknot_penalty # +0.5 per crossing pair
Lower energy = better structure. A structure is just a list of base-pair indices — no 3D coordinates, no domain expertise needed to write a solver.
Agent Architecture
Four agents run in a Google Cloud ADK LoopAgent powered by Gemini — the code-first path within the Google Cloud Agent Builder ecosystem. MCP servers are connected via @modelcontextprotocol/sdk StdioClientTransport (the recommended approach; the Agent Platform Studio MCP UI is currently in preview and not yet functional for API-key auth). The MCPRegistry mounts each partner server and the AgentContext wraps each client with a typed interface.
┌──────────────────────────────────────────────────────────────────┐
│ Google Cloud ADK LoopAgent (Gemini) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Proposer │ → │ Mutator │ → │ Scorer │ → │ Critic │ │
│ │ │ │ │ │ │ │ │ │
│ │ Elastic │ │ Simulated│ │ Rust │ │ Phoenix │ │
│ │ recall → │ │ annealing│ │ oracle │ │ traces → │ │
│ │ seed fold│ │ at temp T│ │ (ground │ │ retune │ │
│ │ │ │ │ │ truth) │ │ strategy │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ↑ MongoDB checkpoint │ │
│ └──────────────────── loop ────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
│
┌────────────────────┤ on record beaten
▼ ▼
Fivetran/BigQuery GitLab Duo MCP
(confirm world best) (open the MR)
MCP Servers and their Structural Role
| Package | Role | What It Does | |
|---|---|---|---|
| GitLab | @gitlab/duo-mcp-server + REST API |
Publisher | rna-arena run → agent opens leaderboard MR via GitLab Duo MCP (or REST API fallback). rna-arena submit → git push + opens MR via GitLab REST API. GitLab CI verifies the oracle. |
| Arize | @arizeai/phoenix-mcp |
Critic self-improvement | The Critic reads its own Phoenix spans mid-run: "which mutation strategy reduced energy most?" → retunes Mutator temperature for the next iteration. initTracing() registers OpenInference at startup. |
| MongoDB | mongodb-mcp-server |
Session checkpointing | Long annealing runs write best-so-far structure to Atlas every 25 iterations via client.callTool({ name: 'insert', ... }). Allows resuming from crash. |
| Elastic | @elastic/mcp-server |
Long-term fold memory | Semantic search over every structure ever scored: recall the 5 lowest-energy folds for similar sequences via client.callTool({ name: 'search', ... }). Seeds the Proposer with real starting points. |
| Dynatrace | @dynatrace/mcp |
Runtime health gate | The Critic queries oracle-service latency. If the Cloud Run verifier degrades, it reduces iteration budget to stay within SLO. |
| Fivetran | @fivetran/fivetran-mcp |
Live world-record target | Syncs LEADERBOARD.json → BigQuery. The Critic queries the live world-record energy so agents optimize against the actual bar. |
Full System Architecture
rna-arena CLI (Rust)
init → create solvers/<name>/{solver.ts,strategy.md,config.yaml}
test → Rust oracle: validate pairs locally
score → Rust oracle: deterministic energy (same result everywhere)
run → Google Cloud ADK LoopAgent: Proposer→Mutator→Scorer→Critic
submit → Rust oracle (local verify) → GitLab Duo MCP: open MR
board → fetch Cloud Run API → render table
Contestant Experience
Contestants edit one folder. Everything else — the engine, oracle, agents, and leaderboard — is locked by the CI path-guard:
solvers/my-solver/
├── solver.ts ← the only file you must write
│ export async function fold(seq, ctx): Promise<Pairs>
├── strategy.md ← describe your approach (GitLab Duo reads this at init)
└── config.yaml ← which agents to enable, iteration budget
Beginners write a 5-line pure algorithm. Advanced teams orchestrate the full stack. Either approach goes through the same CI pipeline.
Built With
- adk
- gitlab
- rust
- typescript
Log in or sign up for Devpost to join the conversation.