Inspiration
Quantum computers need fast, reliable error correction to become useful. A big part of that problem is the decoder: the hardware/software path that turns noisy syndrome measurements into corrections before errors compound.
We built CryoBrain because most AI-for-hardware demos stop at generated code or proxy scores. We wanted a system where agents could propose hardware improvements, test them against real verification and measurement tools, learn from research, and leave behind artifacts that judges can actually inspect.
What It Does
CryoBrain runs a multi-agent refinement loop for quantum-decoder hardware. Each iteration proposes decoder/FIFO design changes, measures them through a real toolchain, records verification artifacts, and archives the evidence.
The reward is grounded in measured behavior rather than a mock demo:
[ reward \approx f(\text{logical error suppression}, \text{latency}, \text{area}, \text{validity}) ]
The project includes:
- A measured RL/refinement loop
- 50 archived sponsor-backed training iterations
- Verifiable artifacts for each iteration
- Offline demo dashboard
- Memory A/B tracking with honest parity reporting
- Pareto frontier views for hardware tradeoffs
- Sponsor integrations for research, model generation, remote measurement, and eval proof
How We Built It
We built the project as a Python-based hardware-design training environment. The core loop runs agents that propose, generate, verify, measure, score, and archive hardware variants.
The system uses real hardware tooling where possible: Verilator for simulation, Yosys for synthesis-style metrics, Stim for quantum-error simulation, and Python orchestration for the RL loop. We also integrated sponsor platforms: Exa for research context, Fireworks for proposal generation, Modal for remote measurement, and HUD for evaluation gating.
For the final proof run, we executed 50 sponsor-backed refinement iterations and archived each cycle under artifacts/marathon_runs/cycle_###, with a summary validator confirming the artifacts are present and measurable.
Challenges We Faced
The hardest part was keeping the evidence honest. It is easy to make an AI system look like it is improving if the reward is synthetic or if artifacts are overwritten. We had to build checks that preserved every iteration and verified that the data was real.
We also had to fix claim discipline around memory. The memory A/B evidence showed parity, not an advantage, so we changed the demo and validators to avoid claiming memory improved the run unless the endpoint delta is strictly positive.
Another challenge was making the project submission-clean: removing local developer paths, avoiding committed secrets, keeping sponsor evidence verifiable, and ensuring the dashboard only presents measured artifacts.
What We Learned
We learned that AI agents can be useful for hardware design only when their outputs are tied to real verification. The interesting part is not just generating RTL; it is closing the loop between research, candidate generation, simulation, scoring, and auditability.
We also learned that honest negative or neutral results matter. Memory parity is still useful evidence because it prevents overclaiming and makes future improvement measurable.
What's Next
Next, we want to push beyond parity into real memory-driven improvement, expand the decoder search space, add stronger formal verification, and run longer training loops with richer research ingestion.
Built With
- ai
- css
- exa
- fireworks
- github
- hud
- javascript
- json
- markdown
- modal
- pytest
- python-html
- stim
- uv
- verilator
- wsl
- yosys
Log in or sign up for Devpost to join the conversation.