Inspiration

The spark for CodeDiff AI came from a painful reality: FinTech security is broken. We watched a promising startup lose significant runway due to a single "medium severity" SQL injection that slipped past standard linters. It wasn't just a bug; it was a regulatory disaster. We realized that current tools like SonarQube or ESLint are context-blind. They treat a memory leak in a blog post the same as a race condition in a payment gateway. We asked ourselves: Why doesn't our code editor tell us the dollar cost of our mistakes?

We set out to build a tool that translates technical debt into financial risk—moving security from "abstract warnings" to "economic reality."

🔄 How CodeDiff AI Works

  1. A developer opens or updates a GitHub Pull Request
  2. CodeDiff AI is triggered via a GitHub webhook
  3. The PR code diff is analyzed using multiple AI models to detect logical and security flaws
  4. A deterministic math layer verifies sensitive data to eliminate false positives
  5. Vulnerabilities are simulated to show how an attacker could exploit them
  6. Each issue is assigned a financial risk score based on real-world compliance costs
  7. A summary report is posted directly on the GitHub PR
  8. A detailed analysis with charts and history is available in the dashboard

You can view previous PR scans and track whether your code security is improving over time.

How we built it

We architected the system as a Zero-Hallucination Pipeline:

Frontend: We built a cinematic Next.js 14 application with Tailwind CSS. The core challenge was the Attack Terminal, a custom-built React component that simulates a CLI environment with typing animations and ANSI color codes to visualize exploits in real-time.

The AI Orchestrator: We implemented a "Router" pattern that sends context-heavy tasks to Gemini (for its large context window) and logic-heavy tasks to DeepSeek (for code reasoning) also Hugging Face api for bac. The Librarian (HuggingFace Zero-Shot): Role: Intent classification. Why: Determines if a PR is a "Database Migration," "UI Tweak," or "Critical Auth Change."

The Math Layer: We wrote custom TypeScript engines for the deterministic checks. For credit card validation, we implemented the Luhn Algorithm check.

Backend: We used PostgreSQL and Prisma to store audit trails, ensuring every scan has a permanent chain-of-custody record.

Challenges we ran into

The "Timeout" Trap: Early versions of our app crashed when analyzing large repositories because the AI models took too long to respond. Traditional API routes couldn't handle the latency.

Solution: Implementing Inngest allowed us to move these heavy workloads to the background. We decoupled the "Scan Request" from the "Scan Processing," allowing the UI to remain snappy while Inngest managed the heavy lifting and model coordination asynchronously.

The "AI Hallucination" Problem: The AI initially flagged random 16-digit numbers as "Credit Cards."

Solution: We enforced a "Math-First" policy. The AI creates a candidate list, but the deterministic Luhn Algorithm acts as the final gatekeeper. If the math doesn't check out, the alert is suppressed.

Quantifying "Risk": Translating a "buffer overflow" into "dollars" is subjective.

Solution: We researched actual GDPR and PCI-DSS fine structures to create a Financial Multiplier Matrix, ensuring dollar amounts felt accurate rather than arbitrary.

Accomplishments that we're proud of

Zero False Positives on Payments: Thanks to our hybrid AI/Math approach, we achieved almost 100% accuracy on detecting valid vs. invalid credit card numbers in our test suite.

The "Neural Terminal": We are incredibly proud of the UI. It looks and feels like a movie-grade hacking terminal, making security work feel exciting rather than tedious.

The Economic Pivot: Successfully building a logic engine that outputs Financial Liability ($) instead of just Severity Levels. It changes the conversation from "Code Quality" to "Business Survival."

What we learned

Math beats AI: For compliance (PCI, HIPAA), you cannot rely on Probabilistic Models (LLMs). You need Deterministic Models (Algorithms). The best results come from combining them.

Fear motivates fixes: We learned that developers are 40% more likely to fix a bug if they see "Potential Fine: $50,000" compared to just "Severity: High."

Human-in-the-Loop is vital: We decided not to auto-fix financial bugs. Regulatory compliance requires a human chain of custody. AI should be the Auditor, not the Author.

What's next for CodeDiff AI: The Economic Security Engine

IDE Extension: Bringing the "Financial Risk" score directly into VS Code so developers see the cost of their code as they type.

Smart Contract Auditing: Expanding our deterministic layer to support Solidity and identifying gas-optimization leaks (converting "Gas" to "Dollars").

Enterprise Integration: Adding SSO and team-based liability dashboards for CTOs to track technical debt reduction in dollar amounts.

Student Security Lab: We are building a dedicated dashboard for students to learn "Financial Coding." They can simulate errors and see exactly how a single bug could cause millions in losses, preparing them for the high-stakes reality of the industry.

Built With

Share this project:

Updates