What Inspired Us
Hardware verification is one of the most overlooked stages of chip design. Most student projects stop at "it works for a few inputs" — but the inputs that break a design are never the ones you think to test. We wanted to build something that thinks adversarially, the way a real verification engineer does, and automates the hard part entirely.
What We Built
FaultClaw is a three-agent autonomous pipeline powered by NVIDIA Nemotron running inside a NemoClaw sandbox.
Agent 1 — Spec Reader parses a hardware design file (Verilog, JSON, or YAML) and extracts the full interface — inputs, outputs, bit widths, and design intent — into a validated spec.
Agent 2 — Adversarial Test Generator uses Nemotron to reason about where bugs are most likely to hide and generates three tiers of tests: normal functional checks, edge cases at every boundary condition, and adversarial inputs specifically crafted to exploit design assumptions. In Breakdown Mode it sweeps all 256 possible input combinations.
Agent 3 — Verification Judge runs every test against a Python golden reference model, logs failures with exact explanations, computes coverage metrics, and feeds failure zones back to Agent 2 — creating a closed feedback loop that gets smarter with every iteration.
Results are delivered in real time via a Telegram bot with /verify, /buggy, and /breakdown commands. The entire pipeline runs sandboxed inside NemoClaw with Landlock, seccomp, and network namespace isolation.
What We Learned
- How to structure a real multi-agent pipeline where agents have clearly defined interfaces and each one validates its inputs before passing them downstream
- How to prompt Nemotron for adversarial reasoning — not just generation, but structured thinking about failure modes
- How NemoClaw's policy engine enforces security boundaries at the network and filesystem level, and why that matters for sensitive IP like hardware designs
- How quickly integration bugs appear when multiple people build separate agents simultaneously — and how important agreed-upon schemas are before anyone writes a line of code
How We Built It
We divided into three roles from the start. One member built Agent 1 and the pipeline orchestration. One built Agent 2 and the Nemotron prompting strategy. One built Agent 3, the verification engine, and the NemoClaw sandbox configuration. We used a shared GitHub repo with per-person branches, agreed on JSON interfaces upfront, and integrated continuously throughout the 24 hours.
The stack: Python 3.12, NVIDIA Nemotron via NIM API, NemoClaw with OpenClaw, python-telegram-bot, and a JSON persistence layer for run history.
Challenges We Faced
- NemoClaw sandbox build failures — the sandbox kept failing on a WeChat plugin trying to reach an unreachable host. We solved it by skipping all messaging channels during onboarding and letting the Telegram bot handle the interface independently.
- Residential network blocking —
api.telegram.orgwas blocked on our network. We worked around it using mobile hotspot for bot testing. - Agent integration conflicts — two team members working on overlapping parts of Agent 2 caused merge conflicts and interface mismatches. A mid-hackathon audit identified every conflict and we resolved them systematically before wiring the full pipeline.
- OpenClaw as a real framework — we initially planned to use OpenClaw as a Python package but discovered it is a sandboxed agent runtime, not a pip-installable library. We adapted by implementing the agent orchestration pattern directly in Python and running the full stack inside NemoClaw where OpenClaw operates natively.
Log in or sign up for Devpost to join the conversation.