Inspiration
Modern AI systems are increasingly being deployed in high-stakes environments such as healthcare, finance, government services, and enterprise automation. Despite their rapid adoption, most LLM-based applications lack systematic adversarial security testing before deployment, leaving them vulnerable to prompt injection, jailbreaks, and sensitive data leakage.
GemmaShield was inspired by the urgent need for a standardized, privacy-preserving way to evaluate the security of LLM applications before they go live. The goal was to bridge the gap between AI capability and AI safety by simulating realistic adversarial attacks in a controlled, fully local environment.
What it does
GemmaShield is a pre-deployment AI security testing platform that automates red teaming for LLM applications.
It simulates adversarial attacks using a multi-agent system powered entirely by Gemma 4 via Ollama. The system includes:
- Attacker Agent: Generates targeted adversarial prompts based on the system prompt of the target model
- Target Agent: Simulates the behavior of the deployed LLM under attack
- Defender Agent: Analyzes interactions and classifies vulnerabilities
- Judge Agent: Assigns a CVSS-like security score and provides mitigation recommendations
Each interaction runs locally in real time, ensuring transparency, reproducibility, and privacy.
How we built it
GemmaShield was built using a full-stack, real-time architecture:
- Backend: FastAPI with Server-Sent Events (SSE) for streaming agent outputs
- Frontend: React-based dashboard featuring a live simulation console and OWASP LLM Top 10 heatmap
- Inference Layer: Ollama REST API running Gemma 4 locally for all agent reasoning
- Database: SQLite + JSONL logs to store full attack-response traces
- Evaluation Layer: Structured JSON outputs for consistent scoring and vulnerability classification
The system follows a sequential multi-agent pipeline where each agent operates with a dedicated system prompt and structured output schema.
Challenges we ran into
One of the main challenges was handling Gemma 4’s “thinking-mode” output, which often included intermediate reasoning before structured JSON responses, causing parsing issues.
We solved this by switching from subprocess-based inference to the Ollama REST API with streaming support, combined with regex-based filtering to reliably extract structured outputs.
Another challenge was maintaining consistent evaluation across multiple adversarial scenarios while keeping the system fully local and dependency-light.
Accomplishments that we're proud of
We successfully built a fully local AI red-teaming platform capable of simulating real-world adversarial attacks on LLM systems.
Key achievements include:
- A working multi-agent security testing pipeline (Attacker, Target, Defender, Judge)
- Real-time streaming of AI interactions with full traceability
- Integration of OWASP LLM Top 10 taxonomy into automated vulnerability classification
- A CVSS-like scoring system for AI security evaluation
- A privacy-preserving architecture that runs entirely offline via local inference
What we learned
This project significantly deepened our understanding of LLM security, adversarial machine learning, and multi-agent system design.
We learned how unpredictable LLM behavior can be in production-like environments and how important structured evaluation frameworks are for making AI systems safe and auditable.
Most importantly, we learned that AI security is not optional, it is a foundational requirement for deploying trustworthy systems at scale.
What's next for GemmaShield
We plan to evolve GemmaShield into a full enterprise-grade AI security platform by adding:
- Continuous integration support for automated LLM security testing in CI/CD pipelines
- Expanded adversarial libraries beyond prompt injection (e.g., data poisoning and tool misuse attacks)
- Support for additional open-source and proprietary models beyond Gemma
- Collaborative threat intelligence sharing between organizations
- Advanced analytics dashboards for security benchmarking across models
Our long-term goal is to make AI security testing a standard step in every LLM deployment pipeline.
Log in or sign up for Devpost to join the conversation.