Inspiration
Verification eats 60–70% of every chip project. Armies of engineers hand-write tests, stare at waveforms, and chase coverage checklists — and a single missed corner case can cost a $100M respin. Meanwhile, the loop itself is mechanical: find the gap → write a test → run it → check it → repeat. That's an agent loop wearing a hard hat. We wanted to prove an AI could close that entire loop — not a copilot suggesting testbench snippets, but an autonomous engineer that decides, tests, catches, and explains on its own.
What it does
Vera is an autonomous verification engineer. Point it at Verilog RTL and it:
- Reads the design and identifies which corner cases haven't been exercised (a hand-rolled functional coverage model — six corner cases per design)
- Writes its own tests — an LLM generates stimulus as structured JSON, one op per clock cycle
- Runs them on a real simulator (Icarus Verilog + cocotb — no mocked hardware)
- Judges correctness, not just coverage: every cycle, DUT outputs are compared against a golden Python reference model. Coverage says what we tried; the oracle says whether it was right:
$$\text{coverage} = \frac{\text{points hit}}{6} \times 100% \quad\text{but}\quad \text{correct} \iff \forall t:\ \text{DUT}(t) = \text{ref}(t)$$
- Catches real bugs: we planted an off-by-one (full = (count == DEPTH-1)) in a FIFO. Vera drives into the corner, catches the mismatch, shrinks the failure to a minimal 7-write repro, points at the offending RTL line, and proposes the one-line fix
- Gets smarter every run: winning test patterns are saved to memory. Run a second design (a round-robin arbiter) and Vera warm-starts from patterns learned on the FIFO — faster climb, "reused N patterns" badge, zero false positives on the clean design
- Explains itself: click any test in the live dashboard for a plain-English explanation; ask the built-in chat anything about the run, grounded in a deep post-run analysis
All streamed live to a mission-control dashboard.
How we built it
Three models, each doing what it's cheapest at, all through Pioneer's single Anthropic-compatible endpoint:
- claude-sonnet-4-6 — the fast brain: high-frequency stimulus generation + per-test explanations
- claude-fable-5 — the heavy brain: post-run engineering analysis and reasoning
- claude-opus-4-8 — interactive Q&A, grounded in Fable 5's analysis + run data + the RTL itself
The stack: Python orchestrator (strategy → generate → run → check → triage agents), Icarus Verilog + cocotb in Docker, FastAPI serving a live state contract, React + Vite dashboard polling at 1 Hz. Tests are pure data (JSON op lists), so all designs share one stimulus vocabulary — write→req0, read→req1 on the arbiter — which is what makes cross-design pattern reuse honest instead of hand-waved. Generated tests that fail to run get their error fed back to the model (max 3 retries), so the loop never stalls. One ./run.sh, or one Docker container deployed on Render.
Challenges we ran into
- The stale-binary lie. Our buggy FIFO passed with zero mismatches — impossible. make was silently reusing the previous DUT's compiled simulation binary. Per-DUT build dirs fixed it, and it was a visceral lesson in why verification infrastructure must itself be verified.
- The AI was too good. Given the full uncovered list, the model one-shotted a 42-cycle test that closed 17%→100% instantly — impressive, useless for understanding what closed what. We rearchitected so the strategy agent picks one target and the generator writes a minimal, surgical test for exactly it.
- Toolchain hell: host Python was too new for cocotb, no simulator installed — solved by shipping the entire sim stack (Icarus + cocotb + engine) in one container.
- Honesty engineering: keeping coverage ("exercised") strictly separate from the checker ("correct") so the demo never overclaims.
Accomplishments that we're proud of
- A fully closed agent loop on a real simulator — not a mockup: AI-written tests climb 0→100% coverage live
- The bug hunt works end-to-end: caught → shrunk from 19 ops to the true minimal 7 writes → root-caused → one-line fix diff, all autonomous
- The flywheel is real: design #2 verifiably warm-starts from design #1's learned patterns, and drops them the moment they stop paying
- Zero false positives on the clean arbiter — Vera doesn't cry wolf
What we learned
- The oracle is the hard part. Generating tests is easy; knowing the answer should have been is the entire game
- Cheap-model/expensive-model splits aren't cost optimization theater — they change what's economically possible to run in a loop
- Agents need fallbacks at every layer (retry → seed tests → demo replay) or live demos die
- Build infrastructure lies to you; verify the verifier
What's next for Vera
- Grounded specs: pipe design documents into a verified knowledge base so "expected behavior" comes with a citation trail, not vibes
- Auto-PR the fix: Vera files the bug with repro + patch as a pull request
- Bigger designs (CPU pipelines), constrained-random stimulus, and adaptive fine-tuning of the fast brain on its own failure data
- Same loop, commercial simulators — the tools companies already pay millions for. Today a FIFO; the wedge is the loop.
Built With
- claude-fable-5-(deep-run-analysis)
- claude-opus-4.8-(grounded-q&a-chat);-anthropic-python-sdk-backend:-fastapi
- cocotb
- cocotb-(python-testbench-framework)-ai-/-apis:-pioneer.ai-inference-platform-(anthropic-compatible-api
- colima-(local-docker-runtime-on-macos)
- css-in-js-with-custom-keyframe-animations-platforms-/-infra:-docker-(multi-stage-build
- full-sim-toolchain-containerized)
- github-testing:-playwright-(automated-ui-verification)
- gnu-make-(simulation-orchestration)-frontend:-react-19
- icarus-verilog-(open-source-simulator)
- javascript-(jsx)
- json
- lucide-icons
- python
- python-3.12
- python-threading-(background-agent-loop)
- recharts-(coverage-chart)
- render-(cloud-deployment-via-render.yaml-blueprint)
- single-key)-serving-three-models-?-anthropic-claude-sonnet-4.6-(stimulus-generation-+-test-explanations)
- sql-free-(state-is-json-contracts)-hardware/simulation:-icarus-verilog-(open-source-simulator)
- uvicorn
- verilog
- vite
Log in or sign up for Devpost to join the conversation.