Inspiration
I built Canary because AI coding sessions fail in two directions, and most developer tooling still treats both as partially invisible.
The first is human-to-model leakage: it is too easy to paste an API key, a password, a .env snippet, or internal information into a prompt before realizing it is about to be handed to an agent.
The second is agent-to-codebase drift: once an agent can read, write, and run commands autonomously, it can move well beyond the original task before I notice. .gitignore does not stop an agent from reading secrets. Secret scanners usually run after the fact. Git diffs show what changed, but not whether the meaning of the code changed in a risky way.
I wanted a tool that lives directly at that boundary: between me and the model, and between the agent and my repository. Canary is that boundary layer.
What it does
Canary is now a shell-first terminal safety layer for AI coding sessions.
The default experience is the interactive canary shell: screening starts on by default, plain text is treated as a prompt, and slash commands like /agent, /audit, /watch, /checkpoint, /rollback, /docs, and /guard keep safety controls inside the same terminal flow.
Today, Canary supports protected launch paths for both claude and codex. canary guard install can place guarded shims in ~/.canary/bin for both tools, and Claude also gets deeper in-session coverage through hooks installed in ~/.claude/settings.json.
Before a prompt is forwarded, Canary scans it for:
- hardcoded secrets
- PII
- sensitive file references
- disclosure-restricted language
- suspicious high-entropy tokens
- semantically similar confidential content
If Canary finds something risky, it renders the findings, computes a bounded risk score, and asks before forwarding the prompt. In Claude sessions, higher-risk follow-up prompts can also be warned on or blocked through the installed prompt hook.
While the agent is running, Canary can:
- open a live audit stream for risky tool activity
- inspect Claude hook events and compatible local transcript JSONL files
- show pending and completed Bash activity
- audit risky
Bash,Write, andEditactions in Claude sessions - scan Bash output for sensitive data exposure
- watch the repository in real time
- detect semantic drift in changed files
- warn on sensitive-file access
- create checkpoints, log session events, and roll the workspace back if needed
In short, Canary protects both directions of an AI coding session: prompt safety and repository safety.
How we built it
I built Canary as a Python CLI with click for the command surface and rich for the terminal UI, but the product has evolved from a set of commands into a persistent shell-first workflow.
The prompt firewall is layered.
First, Canary uses regex-based detection for known secret formats like sk-, ghp_, AKIA, Slack tokens, GitLab tokens, Stripe live keys, inline password= / token= assignments, and sensitive path references like .env, ~/.ssh, or private-key filenames.
Then it adds structured PII detection, including Luhn-validated credit card checks so random long digit strings do not create unnecessary noise.
After that, it runs an entropy sweep to catch suspicious high-entropy tokens while allowlisting hashes, UUIDs, and similar benign identifiers.
Finally, Canary runs a semantic scanner using local IBM Granite embeddings. It compares prompts against anchor texts representing credentials, personal data, financial data, medical data, and proprietary technical content.
Each finding contributes to a weighted score capped at 100:
$$ R = \min\left(\sum_i w_i,\ 100\right) $$
For live agent integration, Canary now supports both claude and codex as guarded launch targets. The shims handle launch-time screening, and Claude gets additional hook coverage for UserPromptSubmit, PreToolUse, PermissionRequest, and PostToolUse. That lets Canary inspect in-session prompts, audit pending Bash, Write, and Edit actions, and scan Bash output after execution.
For repository monitoring, Canary uses watchdog plus a local embedding baseline. It indexes non-binary, non-sensitive, size-limited text files, skips sensitive files like .env and key material entirely, and measures semantic drift with cosine similarity:
$$ d = 1 - \frac{u \cdot v}{|u||v|} $$
canary watch now acts as both a protected launcher and a watcher: it can screen the prompt, create a checkpoint, arm repo surveillance, and then launch the selected agent.
One important implementation detail has changed from the earlier version: the current repo is local-first. The shipped shell UX and setup flow assume on-device Granite embeddings, while Bash auditing currently uses local pattern rules, not a Granite chat model.
Challenges we ran into
The hardest challenge has been making local semantic safety practical across real machines.
Local inference sounds simple until it meets Apple Silicon, CUDA-backed systems, CPU-only laptops, model-cache size, optional dependencies, and first-run download time. The same feature can feel smooth on one machine and heavy on another. That pushed Canary toward a more explicit local-first setup path instead of pretending all devices behave the same way.
Another challenge was timing and visibility. It is relatively easy to inspect a transcript after a session. It is much harder to combine live hook events, permission requests, transcript tailing, detached terminals, and tmux panes into something a developer can understand while the session is still running.
Precision is still a constant balancing act. A safety tool that over-fires becomes background noise, so Canary has to balance regex checks, entropy checks, semantic matches, output scanning, drift thresholds, and confirmation prompts without training users to ignore it.
Accomplishments that we're proud of
I am proud that Canary is now more than a prompt scanner or a demo wrapper. The current repo ships a cohesive terminal product: a shell-first interface, guarded claude and codex launch shims, Claude hook integration, live audit streaming, transcript-backed Bash visibility, repository drift monitoring, named checkpoints, reversible rollback, session logging, guided setup, and built-in docs.
I am especially proud of three things.
First, Canary protects both sides of an AI coding session: the prompt before it reaches the agent, and the repository while the agent is actively operating.
Second, the semantic layer is real. IBM Granite embeddings are on the critical path for prompt similarity checks and semantic drift monitoring, which makes Canary meaning-aware rather than purely rule-based.
Third, the privacy boundary is enforced in code. Sensitive files are surfaced and guarded, but excluded from the embedding baseline instead of being embedded for analysis.
What we learned
I learned that AI coding safety is a workflow problem, not a single feature.
If I only scan prompts, I can still lose visibility once the agent starts acting. If I only watch file changes, I can still leak secrets before the session begins. Canary is strongest when those protections work together inside the same terminal flow.
I also learned that “agent support” is not one thing. Launch-time screening, prompt hooks, permission hooks, tool hooks, transcript parsing, and audit UX all land at different layers. That is why Claude currently has the deepest coverage, while Codex support today is strongest at guarded launch plus transcript-backed audit visibility.
Most of all, I learned that trust is a UX problem. Developers will only keep a safety layer turned on if it is clear, fast, interruptible, and reversible. The move to a persistent shell, inline slash commands, live audit panes, and one-step rollback all came from that lesson.
What's next for Canary
The next step is deeper session coverage beyond the current Claude-specific hook path. Today, Claude can be screened and audited inside the session, while Codex is protected at launch and visible through transcript-backed audit. I want broader parity across agent runtimes, especially for follow-up prompts and permission-sensitive actions inside live sessions.
I also want to keep strengthening the local-only path. The repo already assumes local Granite embeddings as the primary runtime, but Bash auditing is still powered by local pattern rules rather than a stronger local model. Improving that without compromising speed or privacy is a clear next milestone.
On the product side, the direction is still the same:
- stronger team policies
- better audit trails
- more configurable enforcement
- broader agent support
- deeper local inference support
The long-term goal remains simple: if AI coding agents are going to become part of normal software development, they need a native safety layer in the developer workflow.
I want Canary to be that layer.
Log in or sign up for Devpost to join the conversation.