VigilAgent

VigilAgent

Inspiration

Security reviews are one of those things every team knows they should do and almost none actually do consistently. The bottleneck is not knowledge, it is friction. Running a scanner means remembering to run it, waiting for results, reading a report, and acting on it before the PR merges. In practice that chain breaks at step one.

We wanted to eliminate the developer from the loop entirely. Not a better dashboard. Not a faster scanner. A system that watches your repositories, catches a new pull request the moment it opens, runs a full security audit, and posts the findings directly as a comment on the PR where the developer already is. No new tools to learn. No dashboards to check. Just results, automatically, every time.

What it does

VigilAgent watches your GitHub repositories continuously. When a pull request opens, it automatically clones the PR branch, runs four categories of security analysis, and posts a structured report as a GitHub PR comment with no human trigger required.

The dashboard lets you run manual scans, browse historical reports organized by project and timestamp, and chat with Nemotron directly about any specific finding in plain English. Every report includes a 0 to 100 risk score, severity-ranked findings table, CVE details, secrets exposure assessment, and prioritized remediation steps.

How we built it

VigilAgent runs as a five-stage pipeline on an ASUS Ascent GX10, using NVIDIA Nemotron 3 Super (120B) via Ollama for fully local inference.

Stage 1 (Clone): checks out the exact PR branch so we audit the code as it will actually merge, not the base branch.

Stage 2 (Static Analysis): semgrep and bandit scan for insecure code patterns, returning exact file paths and line numbers across all languages.

Stage 3 (Dependency Audit): pip-audit and npm audit cross-reference every dependency against live CVE databases, returning CVE IDs, affected version ranges, and patched versions.

Stage 4 (Secret Detection): gitleaks and trufflehog use regex and entropy analysis to surface hardcoded credentials and API keys.

Stage 5 (Report Synthesis): Nemotron receives the verified JSON output from all four stages and synthesizes a structured markdown report. The LLM cannot hallucinate findings because it only sees what the tools already proved.

The pipeline is orchestrated with LangGraph, exposed via a FastAPI backend, and monitored by a React + Vite + Tailwind dashboard. A background asyncio loop polls the GitHub API for new PRs on watched repos and fires the pipeline automatically when one appears. Everything runs on-device with no cloud API and no data leaving the machine.

Challenges we ran into

Getting the tools to behave. semgrep, bandit, gitleaks, trufflehog, pip-audit, and npm audit each have different exit codes, output formats, and failure modes. A Python repo with no requirements.txt, a Node project with nested package.json files, a repo with no detectable secrets: every edge case needed explicit handling to avoid silent failures that looked like clean scans.

Deployment on real hardware. Sandbox paths were hardcoded for Docker root environments. Running as a non-root user on Linux broke the backend immediately on first launch. We had to build a launch script that detects the runtime environment and patches configuration automatically so a fresh clone truly works with a single command.

Making the LLM trustworthy, not just impressive. The temptation is to let Nemotron scan the code directly. We deliberately did not do that. Every finding traces back to a specific tool output with a file path, line number, or CVE ID that can be independently verified. The LLM's job is communication, not detection.

Accomplishments that we're proud of

Getting the single-command deployment to actually work. Git clone, run the script, and the system comes up fully with the model pulled, dependencies installed, and services started on a machine you have never touched before.

The PR comment workflow closes the loop in a way most security tools do not. Findings surface exactly where the developer is making decisions, before the code merges, without asking them to change their workflow at all.

The in-report chat turned out to be more useful than we expected. Being able to ask "what's the most urgent fix?" or "explain this CVE" against the specific scan data, rather than general knowledge, is genuinely different from a generic chatbot.

What we learned

The LLM should own the last mile, not the whole road. Using Nemotron only for synthesis after deterministic tools have already found and verified the findings produces a system that is both more reliable and more defensible than one where the model does the analysis itself. Every claim in the report is traceable. That matters.

We also learned that zero-touch deployment is a real engineering problem, not just a feature. Making a system work across different users, hardware, network environments, and operating system configurations required handling a long tail of edge cases that are invisible until they happen on someone else's machine.

What's next for VigilAgent

Per-agent LLM interpretation: give each pipeline stage its own Nemotron call so findings are analyzed and prioritized before they reach the synthesis step, not after.

Auto-remediation PRs: after identifying critical issues, have Nemotron generate a fix branch and open a PR with the patch for human review.

NemoClaw sandbox enforcement: the policy file is already written; wiring the runtime enforcement layer would fully sandbox every agent's filesystem and network access.

Persistent job store: swap the in-memory job store for a database so scan history survives server restarts.

Slack and email notifications: surface findings beyond just GitHub comments for teams that want broader visibility.

Built With

asus-ascent-gx10
bandit
fastapi
github-rest-api
gitleaks
langgraph
npm-audit
nvidia-nemotron-3-super-120b
ollama
openai-python-sdk
pip-audit
python-3.12
react-18
react-markdown
react-router
semgrep
tailwind-css
tanstack-query
trufflehog
uvicorn
vite

Updates

Auston McDaniel started this project — May 16, 2026 08:55 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.