Inspiration

Every few weeks, another headline. Another breach. Another company issuing a carefully worded apology for "an unfortunate security incident." The tools to prevent these attacks exist — vulnerability scanners, penetration testing frameworks, DAST pipelines — yet the breaches keep happening.We dug into why. The answer wasn't a lack of tools. It was a lack of thinking.Traditional scanners are rule-based. They match signatures. They flag what they've seen before and miss everything else. They generate thousands of false positives that security teams learn to ignore. And they require expensive, hard-to-find human experts to interpret their output and turn it into action.Meanwhile, the attacker on the other side isn't running a rulebook — they're reasoning. They're chaining vulnerabilities. They're adapting in real time.We asked: what if the defender could too?That's the idea behind VEGA. Not a faster scanner. Not a shinier dashboard. A system that thinks like an attacker — autonomously, at machine speed, with zero human intervention required.

What it does

VEGA is an autonomous multi-agent web application vulnerability scanner. You give it a URL. It attacks the application like a real penetration tester would, reasons about what it finds, and delivers a structured, executive-grade security report — in minutes.Under the hood, five specialized AI agents work in a coordinated pipeline: Crawler — Discovers every accessible endpoint using a Playwright headless browser, mapping the full attack surface of the target application Hypothesis Agent — For each endpoint, generates intelligent, context-aware attack hypotheses using the GROQ LLM Attacker Agent — Executes targeted payloads across 12+ vulnerability classes including SQL Injection, XSS, IDOR, SSRF, CSRF, broken authentication, security misconfigurations, and sensitive data exposure Analyzer Agent — Interprets HTTP responses, server behavior, and application reactions to determine which attacks succeeded and why Narrator Agent — Synthesizes all confirmed findings into a clean, severity-ranked, actionable penetration testing report exportable as a PDF A dedicated False Positive Reducer sits between the Attacker and Analyzer, cross-validating every finding before it surfaces — keeping the false positive rate under 2%.The entire pipeline is visualized in real time on a React + Vite live dashboard that shows the agent DAG in action as the scan runs.

How we built it

Backend — Python + FastAPI The core engine runs on Python, with FastAPI serving the REST API layer. All five agents are orchestrated using LangGraph, which gives us a proper Directed Acyclic Graph (DAG) for agent coordination — explicit state, inspectable transitions, and clean handoffs between agents.AI Layer — GROQ LLM We use GROQ for the LLM inference layer powering the Hypothesis, Analyzer, and Narrator agents. GROQ's low-latency inference was critical — security scans are already time-sensitive, and slow LLM calls would make the tool unusable in practice.Browser Automation — Playwright The Crawler agent uses Playwright to drive a real headless Chromium browser, which means VEGA can handle dynamic, JavaScript-heavy applications that traditional scanners completely miss.Frontend — React + Vite The live dashboard renders the agent DAG in real time, giving users a transparent window into exactly what VEGA is doing at every step — which endpoints are being tested, which hypotheses are being evaluated, and which findings are confirmed.Reporting — Structured PDF Export The Narrator agent generates human-readable reports structured like real penetration testing deliverables — severity rankings, affected endpoints, evidence, and remediation guidance.The entire stack is containerized and designed to run against any web application accessible over HTTP/HTTPS.

Challenges we ran into

Orchestrating agent handoffs without context loss Getting five agents to pass rich, structured context through a LangGraph DAG without information degrading across transitions was harder than expected. A finding confirmed by the Attacker needs to carry full evidence — request, response, payload, endpoint — all the way through to the Narrator. We went through multiple iterations of the state schema before it held up reliably.Handling dynamic, JavaScript-heavy applications Headless browsers behave differently from real users. Authenticated flows, single-page applications, and dynamically rendered endpoints required careful Playwright configuration to crawl faithfully. Plenty of targets that "looked simple" turned out to have JS-gated content that naive crawlers miss entirely.Balancing precision and recall in the FP Reducer The false positive problem in security tooling is real and brutal. Suppress too aggressively and you miss real vulnerabilities. Suppress too loosely and the report is noise. Tuning the FP Reducer to stay under 2% false positives without sacrificing recall on genuine findings required extensive testing across diverse target applications.Making the report actually useful Raw vulnerability dumps are not security reports. Getting the Narrator agent to produce output that reads like a professional pentest deliverable — with context, severity rationale, and actionable remediation — was the most demanding prompt engineering challenge in the project. The difference between a report that gets filed and forgotten and one that drives immediate action is entirely in the writing.

Accomplishments that we're proud of

Built a fully autonomous, end-to-end penetration testing pipeline that requires zero human intervention between URL input and final report Achieved a false positive rate under 2% — competitive with enterprise security tools that cost tens of thousands of dollars annually Covered 12+ vulnerability classes including the full critical tier of the OWASP Top 10 Built a real-time agent visualization dashboard that makes the AI reasoning process fully transparent and inspectable — not a black box Designed a reporting layer that produces output a real security team can act on immediately, not just data they have to interpret themselves Delivered the entire system as a working, deployable product within the hackathon window

What we learned

Agentic systems are a state management problem first. The LLM calls are the easy part. Getting agents to share context cleanly, recover from failures gracefully, and hand off structured information without degradation — that's where the real engineering lives. Offensive security has zero tolerance for noise. In most domains, a high false positive rate is annoying. In security tooling, it's fatal — teams stop trusting the tool, and real vulnerabilities get buried in the noise. Building toward precision taught us to be rigorous about evidence before a finding is surfaced. The output is the product. A vulnerability scanner that produces an unreadable dump is not a usable tool. We learned that the Narrator agent — the reporting layer — deserved as much engineering investment as the attack pipeline itself. Security is ultimately a communication problem as much as a technical one. Speed matters for adoption. Thanks to GROQ's inference speed, VEGA can complete a full scan and report in minutes. We learned early that if a tool takes hours to run, practitioners find workarounds. Speed isn't a feature — it's a prerequisite.

What's next for Vega

Authenticated scan support — OAuth 2.0, JWT, cookie-based, and session-aware scanning so VEGA can test the parts of an application that actually matter most CI/CD pipeline integration — GitHub Actions and GitLab CI plugins so teams can run VEGA automatically on every pull request, shifting security left Alerting integrations — Native Slack, Jira, and Microsoft Teams connectors so findings route directly into the workflows teams already use Expanded vulnerability coverage — Full OWASP Top 10 + business logic flaw detection, which requires deeper application understanding that we're actively building toward Air-gapped enterprise deployment — A self-hosted, network-isolated version for organizations that can't route traffic through external services Continuous monitoring mode — Instead of one-shot scans, VEGA watches for new endpoints and behavioral changes over time, alerting when the attack surface grows

Built With

Share this project:

Updates