What Inspired Us

Hardware verification is one of the most overlooked stages of chip design. Most student projects stop at "it works for a few inputs" — but the inputs that break a design are never the ones you think to test. We wanted to build something that thinks adversarially, the way a real verification engineer does, and automates the hard part entirely.

What We Built

FaultClaw is a three-agent autonomous pipeline powered by NVIDIA Nemotron running inside a NemoClaw sandbox.

Agent 1 — Spec Reader parses a hardware design file (Verilog, JSON, or YAML) and extracts the full interface — inputs, outputs, bit widths, and design intent — into a validated spec.

Agent 2 — Adversarial Test Generator uses Nemotron to reason about where bugs are most likely to hide and generates three tiers of tests: normal functional checks, edge cases at every boundary condition, and adversarial inputs specifically crafted to exploit design assumptions. In Breakdown Mode it sweeps all 256 possible input combinations.

Agent 3 — Verification Judge runs every test against a Python golden reference model, logs failures with exact explanations, computes coverage metrics, and feeds failure zones back to Agent 2 — creating a closed feedback loop that gets smarter with every iteration.

Results are delivered in real time via a Telegram bot with /verify, /buggy, and /breakdown commands. The entire pipeline runs sandboxed inside NemoClaw with Landlock, seccomp, and network namespace isolation.

What We Learned

  • How to structure a real multi-agent pipeline where agents have clearly defined interfaces and each one validates its inputs before passing them downstream
  • How to prompt Nemotron for adversarial reasoning — not just generation, but structured thinking about failure modes
  • How NemoClaw's policy engine enforces security boundaries at the network and filesystem level, and why that matters for sensitive IP like hardware designs
  • How quickly integration bugs appear when multiple people build separate agents simultaneously — and how important agreed-upon schemas are before anyone writes a line of code

How We Built It

We divided into three roles from the start. One member built Agent 1 and the pipeline orchestration. One built Agent 2 and the Nemotron prompting strategy. One built Agent 3, the verification engine, and the NemoClaw sandbox configuration. We used a shared GitHub repo with per-person branches, agreed on JSON interfaces upfront, and integrated continuously throughout the 24 hours.

The stack: Python 3.12, NVIDIA Nemotron via NIM API, NemoClaw with OpenClaw, python-telegram-bot, and a JSON persistence layer for run history.

Challenges We Faced

  • NemoClaw sandbox build failures — the sandbox kept failing on a WeChat plugin trying to reach an unreachable host. We solved it by skipping all messaging channels during onboarding and letting the Telegram bot handle the interface independently.
  • Residential network blockingapi.telegram.org was blocked on our network. We worked around it using mobile hotspot for bot testing.
  • Agent integration conflicts — two team members working on overlapping parts of Agent 2 caused merge conflicts and interface mismatches. A mid-hackathon audit identified every conflict and we resolved them systematically before wiring the full pipeline.
  • OpenClaw as a real framework — we initially planned to use OpenClaw as a Python package but discovered it is a sandboxed agent runtime, not a pip-installable library. We adapted by implementing the agent orchestration pattern directly in Python and running the full stack inside NemoClaw where OpenClaw operates natively.

Built With

Share this project:

Updates