RNA-Arena

A competitive-AI Computational Biology Arena where every world record is a verified GitLab Merge Request.

Live Leaderboard: anudit.gitlab.io/rna-arena
Agent Demo: rna-arena-api-5rwxym3elq-uc.a.run.app/demo
API (health): rna-arena-api-5rwxym3elq-uc.a.run.app/health
API (records): rna-arena-api-5rwxym3elq-uc.a.run.app/api/records
GitLab Repo: gitlab.com/anudit/rna-arena

The Core Idea

Most agent projects use GitLab as a place to store code. RNA-Arena uses GitLab as the protocol.

Every submission to the global leaderboard is a GitLab Merge Request — opened by the GitLab Duo MCP, verified by GitLab CI, and auto-merged only when a Rust oracle independently confirms the score beats the current world record. The leaderboard is not a database write. It is a commit history. Tamper with it and the CI rejects it.

This is a fundamentally different model from platforms like Kaggle, which rely on a central authority to verify scores. RNA-Arena has no central authority. The repo is the leaderboard. The pipeline is the referee. Any research group can fork it, run their own competition on their own infrastructure, and trust the results — because trust is baked into the Git history and CI configuration, not into an opaque server.

The problem being solved: RNA secondary structure prediction with pseudoknots — an NP-hard computational biology problem with direct applications to drug design, RNA therapeutics, and cancer biology. Four AI agents (powered by Gemini via the Google Cloud Agent Development Kit (ADK)) collaborate to push a live scientific record. When they succeed, the GitLab Duo MCP agent opens the MR that makes it official.

What's Novel

Three ideas here that don't exist in combination anywhere else:

1. GitLab as a scientific protocol, not a code host.
Every world record is a verified Merge Request. The leaderboard is not a database row — it is a commit hash that any researcher can reproduce independently. The CI pipeline is the referee; there is no opaque central authority. Kaggle, Papers With Code, and every academic benchmark rely on trusting a server. RNA-Arena trusts only the Git history and the open-source oracle.

2. Self-improving agents via real observability data.
The Critic agent reads its own Arize Phoenix OpenInference spans mid-run to answer: "which mutation strategy reduced energy the most in the last 25 iterations?". It uses that answer to retune the Mutator's temperature before the next iteration. This is not a prompt hack — it is an agent reading structured telemetry it produced itself and adjusting its own behavior. The Fivetran/BigQuery integration means it also knows the live world-record energy, so it is always optimizing toward the actual bar, not a stale local file.

3. A portable, verifiable benchmark template.

The architecture is domain-agnostic. Replace the RNA oracle with a protein-docking score, a circuit-fidelity function, or a compiler-output metric and the entire system (ADK LoopAgent, 6 MCP servers, GitLab CI auto-merge, Pages leaderboard) works unchanged.

How GitLab Powers Every Layer

┌─────────────────────────────────────────────────────────────────┐
│  1. SCAFFOLD  rna-arena init my-solver                          │
│     Creates solvers/my-solver/ with solver.ts + strategy.md     │
│     Edit strategy.md to describe your approach                   │
├─────────────────────────────────────────────────────────────────┤
│  2. OPTIMIZE  rna-arena run my-solver HDV-ribozyme              │
│     4-agent ADK-JS loop (Proposer→Mutator→Scorer→Critic)        │
│     Gemini + 6 MCP tools (GitLab, Elastic, MongoDB, ...)        │
│     beats the current world record energy                        │
├─────────────────────────────────────────────────────────────────┤
│  3. SUBMIT    rna-arena submit my-solver HDV-ribozyme           │
│     CLI verifies score locally (Rust oracle)                     │
│     Git push to submit branch → GitLab REST API opens MR        │
├─────────────────────────────────────────────────────────────────┤
│  4. VERIFY    GitLab CI pipeline (.gitlab-ci.yml)               │
│     Stage 1 — path-guard: rejects diffs outside solvers/<name>/ │
│     Stage 2 — oracle-verify: re-runs Rust oracle independently  │
│              checks energy beats previous record                 │
│     Stage 3 — publish: auto-merges if CI oracle agrees          │
├─────────────────────────────────────────────────────────────────┤
│  5. PUBLISH   GitLab Pages                                       │
│     LEADERBOARD.json → live chart at anudit.gitlab.io/rna-arena │
└─────────────────────────────────────────────────────────────────┘

GitLab powers both the submission path (submit opens an MR via the GitLab REST API) and the verification layer (CI pipeline auto-merges only when the oracle confirms the score). The run command uses the GitLab Duo MCP server to open an MR as part of the agent loop when a record is beaten.

Why GitLab CI is the Trust Layer

Any contestant can run the oracle locally and claim any score. RNA-Arena's answer is structural: you cannot write to LEADERBOARD.json directly. The only path is:

rna-arena submit verifies your structure (Rust oracle, client-side)
GitLab Duo MCP opens an MR — not a direct push
GitLab CI's path-guard stage rejects any diff touching engine/, oracle/, agents/, or leaderboard/ — you can only append to LEADERBOARD.json
GitLab CI's oracle-verify stage re-runs the Rust oracle on the submitted structure independently
Only if both oracles agree and the energy beats the current record does CI auto-merge

No score reaches the leaderboard without being verified twice by independent oracle runs. Cheating is not prevented by policy — it is prevented by the pipeline.

# .gitlab-ci.yml
stages: [guard, verify, merge]

path-guard:       # Fail if diff touches anything outside solvers/<author>/
oracle-verify:    # Re-run Rust oracle on submitted structure in CI
publish:          # Auto-merge only if CI_VERIFIED=true and energy < record

The Problem — and Why It Matters

RNA molecules fold into shapes that determine how they function. Get the shape wrong and the drug doesn't bind. Get it right and you can design RNA-based medicines, silence disease genes, or engineer ribozymes that cut viral RNA — including the HDV hepatitis virus and telomerase (implicated in most cancers), both of which are benchmark sequences in this repo.

The specific challenge — predicting RNA secondary structure with pseudoknots — has been a central open problem in computational biology for 25 years. Pseudoknots are crossing base-pair interactions that standard algorithms can't handle. They are also the structurally critical feature in many medically important RNAs. The problem is proven NP-hard (Lyngsø & Pedersen, 2000), and the hardest benchmark instances were only claimed "solved" in a 2025 research paper — making this an active frontier, not a textbook exercise.

Why this is also a software infrastructure problem: The field has no agreed-upon open benchmark. Labs publish results in papers, but there's no live, tamper-proof leaderboard that any researcher can submit to and trust. RNA-Arena is that infrastructure — and it's built entirely on GitLab. The MR-as-verified-record pattern is portable to any field that needs reproducible, cheating-resistant benchmarks: protein folding, drug docking, materials discovery.

The scoring is deliberately simple (~40 lines of deterministic arithmetic) so the oracle is fast, portable, and unambiguous:

energy(S, pairs) =
  Σ pair_score(i, j)      # −3 G-C,  −2 A-U,  −1 G-U
  + Σ stacking_bonus       # −1.0 for adjacent stacked pairs
  + Σ loop_penalty         # +0.1 per unpaired loop base
  + pseudoknot_penalty     # +0.5 per crossing pair

Lower energy = better structure. A structure is just a list of base-pair indices — no 3D coordinates, no domain expertise needed to write a solver.

Agent Architecture

Four agents run in a Google Cloud ADK LoopAgent powered by Gemini — the code-first path within the Google Cloud Agent Builder ecosystem. MCP servers are connected via @modelcontextprotocol/sdk StdioClientTransport (the recommended approach; the Agent Platform Studio MCP UI is currently in preview and not yet functional for API-key auth). The MCPRegistry mounts each partner server and the AgentContext wraps each client with a typed interface.

┌──────────────────────────────────────────────────────────────────┐
│            Google Cloud ADK LoopAgent  (Gemini)                   │
│                                                                    │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    │
│  │ Proposer │ →  │  Mutator │ →  │  Scorer  │ →  │  Critic  │    │
│  │          │    │          │    │          │    │          │    │
│  │ Elastic  │    │ Simulated│    │ Rust     │    │ Phoenix  │    │
│  │ recall → │    │ annealing│    │ oracle   │    │ traces → │    │
│  │ seed fold│    │ at temp T│    │ (ground  │    │ retune   │    │
│  │          │    │          │    │  truth)  │    │ strategy │    │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘    │
│       ↑                  MongoDB checkpoint                  │    │
│       └──────────────────── loop ────────────────────────────┘    │
└──────────────────────────────────────────────────────────────────┘
                                   │
              ┌────────────────────┤ on record beaten
              ▼                    ▼
        Fivetran/BigQuery    GitLab Duo MCP
        (confirm world best) (open the MR)

MCP Servers and their Structural Role

	Package	Role	What It Does
GitLab	`@gitlab/duo-mcp-server` + REST API	Publisher	`rna-arena run` → agent opens leaderboard MR via GitLab Duo MCP (or REST API fallback). `rna-arena submit` → git push + opens MR via GitLab REST API. GitLab CI verifies the oracle.
Arize	`@arizeai/phoenix-mcp`	Critic self-improvement	The Critic reads its own Phoenix spans mid-run: "which mutation strategy reduced energy most?" → retunes Mutator temperature for the next iteration. `initTracing()` registers OpenInference at startup.
MongoDB	`mongodb-mcp-server`	Session checkpointing	Long annealing runs write best-so-far structure to Atlas every 25 iterations via `client.callTool({ name: 'insert', ... })`. Allows resuming from crash.
Elastic	`@elastic/mcp-server`	Long-term fold memory	Semantic search over every structure ever scored: recall the 5 lowest-energy folds for similar sequences via `client.callTool({ name: 'search', ... })`. Seeds the Proposer with real starting points.
Dynatrace	`@dynatrace/mcp`	Runtime health gate	The Critic queries oracle-service latency. If the Cloud Run verifier degrades, it reduces iteration budget to stay within SLO.
Fivetran	`@fivetran/fivetran-mcp`	Live world-record target	Syncs `LEADERBOARD.json` → BigQuery. The Critic queries the live world-record energy so agents optimize against the actual bar.

Full System Architecture

rna-arena CLI (Rust)
  init    → create solvers/<name>/{solver.ts,strategy.md,config.yaml}
  test    → Rust oracle: validate pairs locally
  score   → Rust oracle: deterministic energy (same result everywhere)
  run     → Google Cloud ADK LoopAgent: Proposer→Mutator→Scorer→Critic
  submit  → Rust oracle (local verify) → GitLab Duo MCP: open MR
  board   → fetch Cloud Run API → render table

Contestant Experience

Contestants edit one folder. Everything else — the engine, oracle, agents, and leaderboard — is locked by the CI path-guard:

solvers/my-solver/
├── solver.ts      ← the only file you must write
│                    export async function fold(seq, ctx): Promise<Pairs>
├── strategy.md    ← describe your approach (GitLab Duo reads this at init)
└── config.yaml    ← which agents to enable, iteration budget

Beginners write a 5-line pure algorithm. Advanced teams orchestrate the full stack. Either approach goes through the same CI pipeline.

Built With

adk
gitlab
rust
typescript

Updates

Anudit Nagar started this project — Jun 11, 2026 12:11 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.