CodeShield

Inspiration

We watched developers at companies struggle with a painful paradox: AI coding tools like ChatGPT and Claude are incredibly powerful, but security teams won't allow them. Why? To get AI help, developers have to give access to their entire codebase, complete with proprietary logic, API keys etc. The tipping point came when we learned that 73% of enterprises block AI coding tools entirely due to security concerns. Billions in productivity lost because there's no safe middle ground.

We asked ourselves: What if AI could help developers without ever seeing their actual code? That question led to CodeShield - an abstraction layer that gives developers unlimited AI coding capability while keeping proprietary code completely private.

What it does:

CodeShield is a privacy-preserving AI coding assistant for enterprises. Here's the complete workflow:

Developer provides a GitHub repository link Dual Analysis - We run two reports in parallel a. Code Analyzer: Our custom tool abstracts the entire codebase structure. Real function names become FUNC_A, variables become VAR_1, API keys get [REDACTED]. We map dependencies and data flow without exposing implementation details.

b. Semgrep Security Scan: Identifies existing vulnerabilities like SQL injection, XSS, exposed secrets, and code quality issues.

How we built itTech Stack:Backend (Python):

Technology Stack

Multi-Language Code Analyzer (Python)

Parses 9+ languages (Python, JS/TS, Java, C++, Go, React, Express, Next.js) Extracts structure metadata WITHOUT source code Generates AI-friendly JSON output

Semgrep Security Scanner

Proper API integration with Semgrep Cloud (not just CLI) Custom security rules for secrets, SQL injection, validation Smart filtering via .semgrepignore (skips READMEs, node_modules) Detailed reports with file + function + CWE/OWASP mapping

AWS Bedrock Integration

Claude 3.5 Sonnet for security analysis Analyzes structure + vulnerabilities Generates actionable remediation plans

Secure Configuration

.env based credential management Never commits secrets to git Production-grade security practices

Key Innovation Zero Source Code Exposure Model

Traditional tools require full source access ❌ CodeShield analyzes structure metadata only ✅ Safe for banks, healthcare, proprietary codebases

Output Four comprehensive reports:

Structure JSON - Commented code metadata Vulnerabilities JSON - Detailed security findings AI Analysis - Claude's security assessment Comprehensive Report - Executive-ready deliverable

Tech Details

Languages: Python 3.7+ APIs: AWS Bedrock, Semgrep Cloud, (Vanta - in progress) Dependencies: boto3, semgrep (minimal by design) Architecture: Modular, extensible, production-ready

Challenges we ran into

Code Abstraction Accuracy: Challenge: Maintaining enough context for AI to be useful while hiding all proprietary details Solution: We developed a hierarchical abstraction system that preserves function relationships and data flow without exposing implementation
Multi-Language Support: Challenge: Different languages have different syntax and structures Solution: Built language-specific parsers for Python, JavaScript, Java, C++, and Go, each handling their unique patterns

Accomplishments that we're proud of

What we learned

What's next for CodeShield

We want to turn this into something developers can actually use every day. That means IDE plugins for VS Code and JetBrains - you shouldn't have to leave your editor to get AI help. We're also planning to integrate directly with existing AI coding assistants, acting as a security layer that sits between them and your code. Real-time collaboration is next - teams should be able to share code maps, see what AI changes teammates are making, and have centralized approval dashboards. For companies with strict compliance needs, we'll build a self-hosted version that runs entirely on their infrastructure. The goal is simple: make CodeShield the standard way companies use AI for coding. This hackathon project is just the start.