Inspiration

On April 7, 2026, Anthropic announced Claude Mythos — a model so capable at finding software vulnerabilities that they chose not to release it publicly. The company launched Project Glasswing to give Mythos Preview access to ~50 critical-infrastructure partners so they could patch their own code before attackers caught up.

Reading about Mythos two weeks ago, one question wouldn't leave me alone: what if the same reasoning that finds zero-days in code could find the misconfigurations that open networks to attack?

Network security has a twin problem to software security. Every major organization runs on thousands of lines of hand-written router, switch, and firewall configurations. One misplaced permit, one wildcard mask in the wrong direction, one shadowed rule beneath the wrong permit — and a door that shouldn't be open is open. Static linters catch syntax. They miss semantics. That's exactly the gap Mythos fills for code.

ACLsmith is the narrow, honest demonstration of what that means for network defense. Not Mythos itself — roughly 5% of it, in one specific domain. That humility is the point.

What it does

ACLsmith is an autonomous AI agent that investigates Cisco IOS router configurations the way a senior network engineer would:

  1. Reads the config and forms hypotheses about which rules look suspicious.
  2. Queries specific parts of the configuration using a library of tools (list_acls, inspect_acl, trace_packet).
  3. Verifies each hypothesis by tracing hypothetical packets through the ACL rule chain — e.g., "what happens when a packet spoofing source 10.0.0.99 tries to reach 10.0.0.10:22 on the WAN interface?"
  4. Reports line-numbered findings with severity, a one-line summary, a full explanation of the flaw, and a concrete suggested fix.

The user drops in a config. The agent streams its reasoning to the screen live — you watch it think. Findings appear as they're confirmed, each linked back to specific lines in the config.

For the demo, ACLsmith ships with three hand-crafted configurations:

  • shadowed.conf — a real-world classic: an edge router with a rule permitting any internal-subnet source on the WAN interface, trivially bypassable via IP spoofing. This shadows the downstream SSH deny.
  • leaky-vlan.conf — ACLs applied in the wrong direction between VLANs, allowing guest→finance traffic the operator thought was blocked.
  • clean.conf — a well-written config. ACLsmith should find nothing of critical severity. This matters: a tool that finds flaws everywhere isn't finding flaws, it's hallucinating.

How we built it

Stack:

  • Claude Opus 4.7 with streaming tool-use and adaptive thinking for the agent loop
  • TypeScript + Vite for the front-end
  • Three.js + GSAP for the landing page motion
  • Pure vanilla — no React, no Tailwind, no UI frameworks

Architecture:

The config parser is a forgiving lexer — structured enough to build a queryable representation, loose enough to handle arbitrary vendor idioms via a rawLines[] escape hatch. Every ACL rule preserves its source line verbatim for display and line-number mapping.

The tool library is the critical leverage: the agent doesn't guess, it proves. The trace_packet tool implements Cisco wildcard-mask matching end-to-end — a bit set to 1 in the wildcard means "don't care" — walking rules in sequence order and returning { matchedRule, action, reason }. When the agent suspects a shadowed rule, it runs a trace. Only confirmed flaws become findings.

The agentic loop streams text_delta, tool_use, and tool_result events to the browser in real time. Tool calls execute client-side against the parsed config, results feed back into the next turn, and the final assistant message emits a structured JSON report. A generation counter handles the case where the user swaps configs mid-investigation.

Design:

Two modes, one codebase.

The landing page (/) is an editorial masthead — giant outlined MYTHOS wordmark, stacked HYPOTHESIZE / QUERY / VERIFY hero, plate-style metadata strips across the top referencing the Glasswing context. It reads like a museum placard for an idea.

The tool (/app) is a live instrument — hazard-yellow cyberpunk chrome, blinking cursor states, a 45° bumblebee hazard stripe, live reasoning stream on the right, config on the left, findings as stacked severity cards. Three demo buttons in the header swap configs mid-run.

Challenges we ran into

Scope honesty. Mythos is a 245-page technical document of capability. I had 4.5 hours. The hardest engineering decision of the day was narrowing — one class of artifact (Cisco ACLs), one reasoning pattern (hypothesize → query → verify → report), three flaws, one narrative.

Wildcard-mask matching. Cisco ACLs use inverse subnet masks (0.0.0.255 means "match /24"). Getting the bitwise match right, handling any and host X.X.X.X forms, and making it composable enough that the agent could use it as a first-class tool took the longest of any single component.

Keeping the agent honest. An early version would produce findings that sounded plausible but weren't verified. The fix was architectural: the system prompt requires every finding to be backed by a trace_packet result. "Report a confirmed finding, not a suspicion" — and the tool output is its only evidence.

Visual restraint. I kept chasing aesthetic references mid-build (Pioneer, Utopia Tokyo, Redline). Every pivot was a threat to shipping. The final direction — editorial masthead for the landing, hazard-yellow instrument for the tool — came from committing to two clear modes instead of one compromise.

Accomplishments that we're proud of

  • The agent actually catches the planted flaw. Opus 4.7 identifies the permit ip 10.0.0.0 0.0.0.255 any shadowing of rule 40 in 2–3 tool-use turns, with packet-trace evidence in-line, including findings beyond what was planned (open-resolver risk, missing BCP 38 anti-spoofing, logging asymmetry).
  • Zero hallucinations on the clean config. Pointed at a well-written ACL, the agent returns no critical or warning findings.
  • Live reasoning stream. You watch the model think, see the tool calls as they happen, and watch findings materialize one at a time — not a ChatGPT-style wall of text after a long wait.
  • Two-mode design. The editorial landing and the instrument-grade tool are visually distinct but share the same typographic DNA.
  • 4.5 hours of build time, $0 in third-party APIs beyond Anthropic credits.

What we learned

Agentic tool-use makes the difference between "impressive demo" and "actually correct." The step that changed the quality of findings wasn't a better prompt — it was the trace_packet tool. Reasoning without verification is hallucination; reasoning with verification is analysis. This is the core insight behind why Mythos-class models matter.

The model is the easy part. The hard part is the tool library. A frontier LLM with three good deterministic tools beats the same model with zero tools by an enormous margin. Mythos's real secret isn't raw intelligence; it's the scaffolding around it.

Narrow scope is a feature, not a compromise. "~5% of Mythos" is a better story than "Mythos for everything." Judges and users trust a tool that knows exactly what it is.

What's next for ACL-Smith

  • More vendor families. Juniper JunOS, Arista EOS, Palo Alto PAN-OS — each with their own ACL / security-policy syntax.
  • Live git integration. Point ACLsmith at a network-config repository; it reviews every PR the way a security engineer would.
  • Expanded tool library. BGP route-policy tracing, firewall-zone crossing analysis, IPSec peer validation.
  • Proxied API key. The demo uses dangerouslyAllowBrowser for speed of build — production would proxy through a server so keys never leave the backend.
  • Apply for Anthropic's Cyber Verification Program to unlock deeper offensive-direction analysis for red-team use cases.
  • The obvious next step is integrating Mythos itself if the Glasswing program ever opens to more partners.

Built With

Share this project:

Updates