WatchDog

Merge Request
CLI Results

Inspiration

Traditional secret scanners rely on pattern matching. They flag every string that looks like a key, regardless of context. A DATABASE_URL in a server route handler is fine, but the same variable exposed through a NEXT_PUBLIC_ prefix or leaked into a client-side bundle is a critical vulnerability. I wanted a scanner that actually understands the difference, one that reasons about framework boundaries, deployment targets, and exploit paths before raising an alert.

What it does

WatchDog is a security scanning agent with 7 specialized scanners covering secrets, client-side exposure, env file hygiene, config flags, infrastructure-as-code, dependencies, and build artifacts. It auto-detects frameworks (Next.js, Vite, Django, Rails, etc.) and adjusts detection logic accordingly. Findings are optionally sent to Claude for contextual exploitability analysis, upgrading genuinely dangerous issues, downgrading false positives, and explaining the exact attack path in plain language.

On GitLab, a Duo Flow triggers on MR events, scans changed files, and posts security findings as MR comments with exploit reasoning. A custom Duo Agent is also available through Duo Chat for on-demand security questions. In CI/CD, WatchDog blocks pipelines on critical findings and generates Code Quality reports for MR diffs.

How we built it

The core is a Python CLI built with Click. An orchestrator coordinates 7 scanner modules, each using regex-based pattern matching (40+ patterns) with placeholder filtering to reduce noise. When ANTHROPIC_API_KEY is set, findings are batched and sent to Claude for contextual reasoning. Claude evaluates each finding against the detected framework, file role (client vs. server), and deployment target. Output is handled by three reporters: Rich terminal tables, GitLab MR comment markdown, and Code Quality JSON for CI/CD integration.

For the GitLab Duo integration, I built a custom Flow (YAML-defined multi-agent workflow) that reacts to MR triggers and a custom Agent accessible through Duo Chat. The project includes 68 tests covering all scanners, reasoning, reporters, and end-to-end CLI flows.

Challenges we ran into

Balancing detection sensitivity with false-positive rates across frameworks was the biggest challenge. Next.js, Vite, Django, and Rails each have different conventions for client/server boundaries, environment variable prefixes, and build output structures. Getting Claude's reasoning prompts right to produce actionable, framework-specific exploit explanations rather than generic warnings took significant iteration. Another challenge was designing the scanner architecture to be modular enough that adding new scanners doesn't require touching the orchestrator.

Accomplishments that we're proud of

7 scanners with 40+ detection patterns and a false-positive rate low enough to be useful in CI without alert fatigue
Claude reasoning that explains HOW an attacker would exploit a finding, not just WHAT was found
Framework-aware severity adjustment, where the same finding gets different severity in Next.js vs. Django based on actual exposure risk
Full test suite (68 tests) that runs without any API keys
Clean dual interface: fast deterministic regex scanning in CI, intelligent Claude reasoning on demand

What we learned

Building on the GitLab Duo Agent Platform showed how flows, agents, and Duo tools provide a solid foundation for automating SDLC tasks that previously required manual intervention or complex webhook setups. On the AI side, using Claude for security reasoning rather than generation demonstrated that LLMs add the most value when they augment deterministic tools with contextual judgment, not when they replace them.

What's next for WatchDog

More scanners: SAST-lite rules for common vulnerability patterns (SQL injection, XSS in templates), API key rotation detection
Auto-fix suggestions: Claude-generated remediation code snippets alongside findings
GitLab integration depth: native Code Quality widget integration, security dashboard reporting
Model comparison: support for multiple LLM backends to compare reasoning quality and cost
Community scanner plugins: allow users to define custom regex patterns and severity rules via YAML config

Built With

anthropic-claude-api
click-(cli-framework)
custom-agents)
gitlab-ci/cd
gitlab-duo-agent-platform-(custom-flows
pytest
python
rich-(terminal-ui)
yaml

Updates

Elias Lehner started this project — Mar 25, 2026 12:17 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.