GemmaShield

Inspiration

Modern AI systems are increasingly being deployed in high-stakes environments such as healthcare, finance, government services, and enterprise automation. Despite their rapid adoption, most LLM-based applications lack systematic adversarial security testing before deployment, leaving them vulnerable to prompt injection, jailbreaks, and sensitive data leakage.

GemmaShield was inspired by the urgent need for a standardized, privacy-preserving way to evaluate the security of LLM applications before they go live. The goal was to bridge the gap between AI capability and AI safety by simulating realistic adversarial attacks in a controlled, fully local environment.

What it does

GemmaShield is a pre-deployment AI security testing platform that automates red teaming for LLM applications.

It simulates adversarial attacks using a multi-agent system powered entirely by Gemma 4 via Ollama. The system includes:

Attacker Agent: Generates targeted adversarial prompts based on the system prompt of the target model
Target Agent: Simulates the behavior of the deployed LLM under attack
Defender Agent: Analyzes interactions and classifies vulnerabilities
Judge Agent: Assigns a CVSS-like security score and provides mitigation recommendations

Each interaction runs locally in real time, ensuring transparency, reproducibility, and privacy.

How we built it

GemmaShield was built using a full-stack, real-time architecture:

Backend: FastAPI with Server-Sent Events (SSE) for streaming agent outputs
Frontend: React-based dashboard featuring a live simulation console and OWASP LLM Top 10 heatmap
Inference Layer: Ollama REST API running Gemma 4 locally for all agent reasoning
Database: SQLite + JSONL logs to store full attack-response traces
Evaluation Layer: Structured JSON outputs for consistent scoring and vulnerability classification

The system follows a sequential multi-agent pipeline where each agent operates with a dedicated system prompt and structured output schema.

Challenges we ran into

One of the main challenges was handling Gemma 4’s “thinking-mode” output, which often included intermediate reasoning before structured JSON responses, causing parsing issues.

We solved this by switching from subprocess-based inference to the Ollama REST API with streaming support, combined with regex-based filtering to reliably extract structured outputs.

Another challenge was maintaining consistent evaluation across multiple adversarial scenarios while keeping the system fully local and dependency-light.

Accomplishments that we're proud of

We successfully built a fully local AI red-teaming platform capable of simulating real-world adversarial attacks on LLM systems.

Key achievements include:

A working multi-agent security testing pipeline (Attacker, Target, Defender, Judge)
Real-time streaming of AI interactions with full traceability
Integration of OWASP LLM Top 10 taxonomy into automated vulnerability classification
A CVSS-like scoring system for AI security evaluation
A privacy-preserving architecture that runs entirely offline via local inference

What we learned

This project significantly deepened our understanding of LLM security, adversarial machine learning, and multi-agent system design.

We learned how unpredictable LLM behavior can be in production-like environments and how important structured evaluation frameworks are for making AI systems safe and auditable.

Most importantly, we learned that AI security is not optional, it is a foundational requirement for deploying trustworthy systems at scale.

What's next for GemmaShield

We plan to evolve GemmaShield into a full enterprise-grade AI security platform by adding:

Continuous integration support for automated LLM security testing in CI/CD pipelines
Expanded adversarial libraries beyond prompt injection (e.g., data poisoning and tool misuse attacks)
Support for additional open-source and proprietary models beyond Gemma
Collaborative threat intelligence sharing between organizations
Advanced analytics dashboards for security benchmarking across models

Our long-term goal is to make AI security testing a standard step in every LLM deployment pipeline.

Built With

fastapi
gemma4
github
javascript
jsonl
node.js
ollama
python
react
sqlite
sse

Updates

Abrar Fahad started this project — May 22, 2026 10:23 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.