CryoBrain

Inspiration

Quantum computers need fast, reliable error correction to become useful. A big part of that problem is the decoder: the hardware/software path that turns noisy syndrome measurements into corrections before errors compound.

We built CryoBrain because most AI-for-hardware demos stop at generated code or proxy scores. We wanted a system where agents could propose hardware improvements, test them against real verification and measurement tools, learn from research, and leave behind artifacts that judges can actually inspect.

What It Does

CryoBrain runs a multi-agent refinement loop for quantum-decoder hardware. Each iteration proposes decoder/FIFO design changes, measures them through a real toolchain, records verification artifacts, and archives the evidence.

The reward is grounded in measured behavior rather than a mock demo:

[ reward \approx f(\text{logical error suppression}, \text{latency}, \text{area}, \text{validity}) ]

The project includes:

A measured RL/refinement loop
50 archived sponsor-backed training iterations
Verifiable artifacts for each iteration
Offline demo dashboard
Memory A/B tracking with honest parity reporting
Pareto frontier views for hardware tradeoffs
Sponsor integrations for research, model generation, remote measurement, and eval proof

How We Built It

We built the project as a Python-based hardware-design training environment. The core loop runs agents that propose, generate, verify, measure, score, and archive hardware variants.

The system uses real hardware tooling where possible: Verilator for simulation, Yosys for synthesis-style metrics, Stim for quantum-error simulation, and Python orchestration for the RL loop. We also integrated sponsor platforms: Exa for research context, Fireworks for proposal generation, Modal for remote measurement, and HUD for evaluation gating.

For the final proof run, we executed 50 sponsor-backed refinement iterations and archived each cycle under artifacts/marathon_runs/cycle_###, with a summary validator confirming the artifacts are present and measurable.

Challenges We Faced

The hardest part was keeping the evidence honest. It is easy to make an AI system look like it is improving if the reward is synthetic or if artifacts are overwritten. We had to build checks that preserved every iteration and verified that the data was real.

We also had to fix claim discipline around memory. The memory A/B evidence showed parity, not an advantage, so we changed the demo and validators to avoid claiming memory improved the run unless the endpoint delta is strictly positive.

Another challenge was making the project submission-clean: removing local developer paths, avoiding committed secrets, keeping sponsor evidence verifiable, and ensuring the dashboard only presents measured artifacts.

What We Learned

We learned that AI agents can be useful for hardware design only when their outputs are tied to real verification. The interesting part is not just generating RTL; it is closing the loop between research, candidate generation, simulation, scoring, and auditability.

We also learned that honest negative or neutral results matter. Memory parity is still useful evidence because it prevents overclaiming and makes future improvement measurable.

What's Next

Next, we want to push beyond parity into real memory-driven improvement, expand the decoder search space, add stronger formal verification, and run longer training loops with richer research ingestion.

Built With

ai
css
exa
fireworks
github
hud
javascript
json
markdown
modal
pytest
python-html
stim
uv
verilator
wsl
yosys

Updates

Ayush Ojha started this project — Jun 21, 2026 02:40 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.