CodeGuardian

logo

CodeGuardian - AI-Powered Security Audit Agent

Inspiration

Security vulnerabilities cost the global economy $6 trillion annually, yet most developers lack the time and expertise to conduct thorough security audits. Traditional security tools generate overwhelming false positives and require manual interpretation. We asked ourselves: What if an AI could autonomously scan code, understand context like a human security researcher, and generate production-ready fixes?

CodeGuardian was born from this vision - an autonomous AI agent that doesn't just find vulnerabilities, but understands exploit chains, simulates attacks, and proposes intelligent fixes. Unlike static analyzers that pattern-match, CodeGuardian uses Gemini 3 Flash and Pro to reason about code semantics, understand business logic flaws, and even engage in adversarial "Red Team vs Blue Team" battles to validate security posture.

What it does

CodeGuardian is an autonomous security audit agent that combines cutting-edge AI reasoning with practical DevSecOps workflow:

Intelligent Code Analysis

Multi-language support (Python, JavaScript, Java, PHP, Go, and more)
Context-aware vulnerability detection using Gemini's 2M token context window
Detects OWASP Top 10, CWE Top 25, and novel attack patterns
Understands business logic flaws that traditional tools miss

Red Team vs Blue Team Battle Mode

Red Team AI: Actively tries to exploit discovered vulnerabilities
Blue Team AI: Proposes and validates security fixes
Simulates real attack chains (e.g., SQL Injection → RCE → Privilege Escalation)
Generates exploit proof-of-concepts in sandboxed Docker containers

Autonomous Fix Generation

AI-generated secure code replacements
Automated testing to ensure fixes don't break functionality
Diff previews with detailed explanations
One-click application of vetted patches

Interactive Web Dashboard

Real-time scan progress with WebSocket streaming
Beautiful vulnerability visualizations and attack flow diagrams
Integrated terminal for CLI access
AI chatbot for security mentoring
Downloadable HTML/JSON reports with OWASP/CWE mappings

Compliance Mapping

Automatic mapping to PCI-DSS, SOC2, HIPAA, GDPR
Risk scoring with actionable remediation steps
Executive summaries for non-technical stakeholders

How we built it

Architecture

┌────────────────────────────────────────────────────────────┐
│                    Web Dashboard (Node.js)                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ File Scanner │  │ AI Chatbot   │  │ Live Reports │     │
│  │ (Express.js) │  │ (Socket.IO)  │  │ (HTML/JSON)  │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└─────────────────────────┬──────────────────────────────────┘
                          │ REST API / WebSocket
┌─────────────────────────▼──────────────────────────────────┐
│             Python Core Engine (src/)                       │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Gemini Client (Flash for speed, Pro for deep analysis) │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ Marathon     │  │ Vulnerability│  │ Attack Chain │     │
│  │ Agent        │  │ Scanner      │  │ Detector     │     │
│  │ (Autonomy)   │  │ (Detection)  │  │ (Reasoning)  │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │ Fix          │  │ Report       │  │ Attack       │     │
│  │ Generator    │  │ Engine       │  │ Simulator    │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
└────────────────────────────────────────────────────────────┘

Key Technologies

Backend (Python):

Google Gemini 3 Flash & Pro: For fast scanning (Flash) and deep reasoning (Pro)
Marathon Agent Pattern: Long-running autonomous agent with self-correction
Rich Console: Beautiful terminal UI with progress tracking
Docker SDK: Sandboxed attack simulation (when available)
Bandit, Safety: Static analysis tool integration

Frontend (Node.js):

Express.js: RESTful API server
Socket.IO: Real-time bidirectional streaming
Xterm.js: Embedded terminal emulator
Multer: File upload handling with extension preservation
Vanilla JS: No framework overhead, pure performance

AI Integration:

Prompt Engineering: Sophisticated multi-stage prompts for vulnerability reasoning
Token Optimization: Efficient chunking for 2M context window utilization
Chain-of-Thought: Step-by-step reasoning for complex attack chains
Few-Shot Learning: Example-guided vulnerability detection

Development Process

Phase 1 - Core Engine: Built the Python scanner with Gemini integration, focusing on SQL injection and XSS detection
Phase 2 - Adversarial Battle: Implemented Red/Blue team AI agents with sandboxed attack simulation
Phase 3 - Web Dashboard: Created responsive UI with real-time updates and chat interface
Phase 4 - Multi-Language: Extended support to Java, PHP, Go with language-specific patterns
Phase 5 - Polish: Added 95+ example vulnerabilities, comprehensive documentation, and error handling

Challenges we ran into

1. Context Window Management

Problem: Large codebases exceeded even Gemini's 2M token limit.
Solution: Implemented intelligent code chunking with dependency-aware splitting. Files are ranked by "security criticality" (user input handlers, database queries, authentication logic) and scanned first. We use embeddings to maintain context across chunks.

2. False Positive Reduction

Problem: Generic pattern matching flagged too many non-exploitable issues.
Solution: Introduced two-pass scanning:

Pass 1 (Gemini Flash): Fast heuristic detection
Pass 2 (Gemini Pro): Deep semantic analysis with exploit verification This reduced false positives by 73% while increasing true positive detection.

3. Attack Simulation Safety

Problem: Running actual exploits risks damaging the host system.
Solution: Containerized attack simulation using Docker with resource limits, network isolation, and automatic cleanup. Fallback to "theoretical exploitation" when Docker is unavailable.

4. Real-Time Streaming

Problem: Python scanner couldn't stream progress to Node.js dashboard.
Solution: Implemented hybrid streaming:

stdout/stderr capture from Python subprocess
Socket.IO emit from Node.js to browser
Progress parsing with regex to extract percentage/status

5. Windows Path Handling

Problem: Paths with spaces (e.g., "New folder") broke spawn() commands.
Solution: Switched from shell: true to shell: false with proper array argument passing, and normalized paths with forward slashes.

Accomplishments that we're proud of

✅ 95+ Real Vulnerabilities Detected across 5 programming languages
✅ Adversarial AI Battle System - First-of-its-kind Red/Blue team competition
✅ Zero-Configuration Setup - Works out-of-the-box with just GEMINI_API_KEY
✅ Beautiful Reports - Interactive HTML with attack flow diagrams
✅ Integrated Terminal - Full CLI access directly in the web dashboard
✅ Sub-2-Second Response for Gemini Flash queries in chat mode
✅ Production-Ready Fixes - Auto-generated patches that actually work
✅ Comprehensive Documentation - 7 markdown guides, 200+ code comments

Most Proud Moment: When the Red Team AI discovered a 3-stage attack chain (XSS → Session Hijacking → Admin Panel Access) that we hadn't manually coded into the examples. The AI reasoned about the exploit path autonomously!

What we learned

Technical Learnings

Gemini's reasoning is incredible: It genuinely understands code semantics, not just syntax
Prompt engineering is an art: Small wording changes drastically affect output quality
Autonomous agents need guardrails: Marathon agent required retry logic and validation checks
Real-time UX matters: Users want to see progress, not just final results

Security Insights

Attack chains are underrated: Most tools find individual bugs but miss how they combine
Context is everything: The same code pattern is safe in one context, vulnerable in another
Developers want learning: People loved the AI chatbot that explains why code is insecure

Project Management

Documentation drives adoption: Our quickstart guide led to instant user success
Iterate on feedback: Early testers helped identify the path-with-spaces bug
Examples matter: The 95-vulnerability test suite became our best feature demo

What's next for CodeGuardian

Short-Term (Next 2 Months)

IDE Extensions: VS Code plugin for real-time scanning as you type
CI/CD Integration: GitHub Actions, GitLab CI pipelines
Mobile Dashboard: React Native app for on-the-go monitoring
API Versioning: Stable REST API for third-party integrations

Medium-Term (6 Months)

Fine-Tuned Model: Train specialized Gemini model on CVE database
Enterprise Features: RBAC, SSO, team collaboration
Trend Analysis: Track security improvements over time
Bug Bounty Integration: Auto-submit to HackerOne/Bugcrowd

Long-Term Vision

Open Source Community: Public vulnerability pattern database
Educational Platform: Interactive security training with AI tutoring
Certification: "CodeGuardian Verified" badge for secure repos
Agentic Autonomy: Self-improving AI that learns from every scan

Ultimate Goal: Make security accessible to every developer, regardless of expertise level. Today, only 5% of code gets professional security reviews. With CodeGuardian, we want to make it 100%.

Built With

bandit
beautifulsoup4
beautifulsoup4-|-tools:-docker
css3
css3-grid/flexbox-|-python-libraries:-rich
docker
dotenv
dotenv-|-apis:-google-ai-studio
express.js
font-awesome
git
google-ai-studio-api
google-gemini-3-flash-&-pro
google-generativeai
google-generativeai-sdk-|-frontend:-xterm.js
javascript-(vanilla)-|-frameworks:-express.js
node.js-16+
powershell
python-3.9+
rich
socket.io
socket.io-|-ai:-google-gemini-3-flash-&-pro
vanilla-javascript
websocket
xterm.js

Updates

Private user started this project — Feb 09, 2026 04:26 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.