## Inspiration

As a Mechanical Engineer and Machinist, my career is built on the concept of "tolerances." In a CNC machine, if a measurement is off by a fraction of a millimetre, the part is scrapped. I applied this same "Zero Trust" mindset to AI. While building my Sovereign Archive—a decentralized knowledge mesh—I realized that as we give AI more agency over our data, we need a "digital calliper" to measure the safety of every input before it reaches the model.

## What it does

PromptGuard acts as a real-time security firewall for Large Language Models. It analyzes incoming text to detect Prompt Injections (like "Ignore previous instructions") or roleplay attacks.

  • Detection: Uses a custom-trained Logistic Regression model for instant classification.
  • Explanation: If a threat is detected, it utilizes Gemini 3.1 Flash-Lite to provide a concise, one-sentence explanation of the attack vector, helping users understand the risk.

## How we built it

The project is built on a high-performance stack designed for low latency and high accuracy:

  • Backend: FastAPI running on Python 3.14.
  • Machine Learning: A scikit-learn pipeline using TfidfVectorizer for feature extraction and Logistic Regression for probability-based classification.
  • AI Integration: The latest google-genai SDK to interface with Gemini 3.1 Flash-Lite for explainable security posture.
  • Math Foundation: The detection is based on the logistic function to calculate the probability $P$ of an injection: $$P(y=1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1 + ... + \beta_nx_n)}}$$

## Challenges we ran into

  • Environment Parity: We hit significant InconsistentVersionWarning errors when trying to load a model trained in Python 3.12 into a 3.14 environment. This taught us the critical importance of exact dependency locking.
  • Strict Validation: The new google-genai SDK uses Pydantic for validation. Configuring the ThinkingConfig for the Gemini model required deep-diving into nested object types to resolve extra_forbidden errors.
  • Adversarial Nuance: Detecting "soft" injections like "Opposite Day" games proved harder than catching direct overrides, requiring careful calibration of our decision boundaries.

## Accomplishments that we're proud of

  • Low-Latency Performance: By using Flash-Lite and a lightweight ML model, we achieved near-instantaneous analysis.
  • Zero-Footprint Deployment: We successfully optimized our main.py and .gitignore to ensure the repository is clean, professional, and easy for other developers to clone and run.
  • Explainable AI: We didn't just stop at "Safe" or "Unsafe"—we built a system that actually teaches the user why a prompt was flagged.

## What we learned

  • The "Vibe Coding" Workflow: Using AI to orchestrate and repair the ASGI application middleware significantly accelerated our development speed.
  • Strict Schema Management: We learned that as AI SDKs evolve, understanding Pydantic and type-hinting in Python is no longer optional—it's a core security skill.

## What's next for PromptGuard

The next step is to integrate PromptGuard as a middleware plugin for popular tools like Tailscale or OpenWrt routers. This would allow for a "Security-at-the-Edge" approach, protecting a user’s entire local homelab or "Sovereign Archive" from malicious AI interactions at the network level.

Built With

Share this project:

Updates