Inspiration
Traditional secret scanners rely on pattern matching. They flag every string that looks like a key, regardless of context. A DATABASE_URL in a server route handler is fine, but the same variable exposed through a NEXT_PUBLIC_ prefix or leaked into a client-side bundle is a critical vulnerability. I wanted a scanner that actually understands the difference, one that reasons about framework boundaries, deployment targets, and exploit paths before raising an alert.
What it does
WatchDog is a security scanning agent with 7 specialized scanners covering secrets, client-side exposure, env file hygiene, config flags, infrastructure-as-code, dependencies, and build artifacts. It auto-detects frameworks (Next.js, Vite, Django, Rails, etc.) and adjusts detection logic accordingly. Findings are optionally sent to Claude for contextual exploitability analysis, upgrading genuinely dangerous issues, downgrading false positives, and explaining the exact attack path in plain language.
On GitLab, a Duo Flow triggers on MR events, scans changed files, and posts security findings as MR comments with exploit reasoning. A custom Duo Agent is also available through Duo Chat for on-demand security questions. In CI/CD, WatchDog blocks pipelines on critical findings and generates Code Quality reports for MR diffs.
How we built it
The core is a Python CLI built with Click. An orchestrator coordinates 7 scanner modules, each using regex-based pattern matching (40+ patterns) with placeholder filtering to reduce noise. When ANTHROPIC_API_KEY is set, findings are batched and sent to Claude for contextual reasoning. Claude evaluates each finding against the detected framework, file role (client vs. server), and deployment target. Output is handled by three reporters: Rich terminal tables, GitLab MR comment markdown, and Code Quality JSON for CI/CD integration.
For the GitLab Duo integration, I built a custom Flow (YAML-defined multi-agent workflow) that reacts to MR triggers and a custom Agent accessible through Duo Chat. The project includes 68 tests covering all scanners, reasoning, reporters, and end-to-end CLI flows.
Challenges we ran into
Balancing detection sensitivity with false-positive rates across frameworks was the biggest challenge. Next.js, Vite, Django, and Rails each have different conventions for client/server boundaries, environment variable prefixes, and build output structures. Getting Claude's reasoning prompts right to produce actionable, framework-specific exploit explanations rather than generic warnings took significant iteration. Another challenge was designing the scanner architecture to be modular enough that adding new scanners doesn't require touching the orchestrator.
Accomplishments that we're proud of
- 7 scanners with 40+ detection patterns and a false-positive rate low enough to be useful in CI without alert fatigue
- Claude reasoning that explains HOW an attacker would exploit a finding, not just WHAT was found
- Framework-aware severity adjustment, where the same finding gets different severity in Next.js vs. Django based on actual exposure risk
- Full test suite (68 tests) that runs without any API keys
- Clean dual interface: fast deterministic regex scanning in CI, intelligent Claude reasoning on demand
What we learned
Building on the GitLab Duo Agent Platform showed how flows, agents, and Duo tools provide a solid foundation for automating SDLC tasks that previously required manual intervention or complex webhook setups. On the AI side, using Claude for security reasoning rather than generation demonstrated that LLMs add the most value when they augment deterministic tools with contextual judgment, not when they replace them.
What's next for WatchDog
- More scanners: SAST-lite rules for common vulnerability patterns (SQL injection, XSS in templates), API key rotation detection
- Auto-fix suggestions: Claude-generated remediation code snippets alongside findings
- GitLab integration depth: native Code Quality widget integration, security dashboard reporting
- Model comparison: support for multiple LLM backends to compare reasoning quality and cost
- Community scanner plugins: allow users to define custom regex patterns and severity rules via YAML config
Built With
- anthropic-claude-api
- click-(cli-framework)
- custom-agents)
- gitlab-ci/cd
- gitlab-duo-agent-platform-(custom-flows
- pytest
- python
- rich-(terminal-ui)
- yaml
Log in or sign up for Devpost to join the conversation.