Inspiration

As AI chatbots and large language models become integrated into everyday applications, they are increasingly exposed to adversarial prompts and prompt injection attacks. We were inspired by the growing realization that while organizations are rapidly deploying AI systems, many lack a security layer to monitor and protect these models from malicious input.

GuardianLM was born from the idea that AI systems should have the same level of protection as traditional software systems. Just as firewalls protect web applications, we wanted to explore how a security layer could analyze prompts, detect adversarial intent, and safeguard AI models before they generate responses.

What it does

We built the detection engine as a lightweight security layer that analyzes prompts before they reach the main AI model. Incoming prompts are first processed through a rule-based filter that checks for known adversarial patterns such as attempts to override system instructions, hidden or obfuscated text, and common jailbreak phrasing.

The prompt is then passed to a semantic analysis model that evaluates the intent and context of the input to detect more subtle adversarial behavior. By combining rule-based detection with AI-driven analysis, the engine can identify both obvious and nuanced attack attempts.

All detection results are logged and sent to a monitoring pipeline using Elasticsearch and visualized in Kibana, allowing us to track and analyze potential prompt attacks in real time.

How we built it

We built GuardianLM as a four-component system. The AI chatbot was built using a Python Flask backend connected to Groq's API running Llama 3.1 8B, with a custom HTML frontend. We built the detection engine as a lightweight security layer that analyzes prompts before they reach the main AI model. Prompts are first checked against a set of rules designed to detect adversarial patterns such as instruction overrides, obfuscated text, and jailbreak attempts. They are then analyzed by a semantic model that evaluates the intent and context of the input.

By combining rule-based filtering with AI analysis, the engine can detect both obvious and subtle adversarial prompts. The logging and dashboard system captures all traffic metadata per session.

Challenges we ran into

Balancing the IDS detection sensitivity was difficult. Rules that were too aggressive flagged legitimate prompts as threats, while rules that were too loose missed subtle injection attempts. Integrating our system with elastic + kibana turned out to be the most stressful part of our project surprisingly, but we are grateful for the experience to work with the technology.

Accomplishments that we're proud of

We successfully built a working end-to-end pipeline that demonstrates real-time prompt threat detection. The IDS correctly identifies and blocks prompt injection, jailbreak attempts, data extraction, and network intrusion queries while allowing legitimate conversations to pass through uninterrupted. The live IDS log panel provides a clear visual demonstration of the security layer in action.

What we learned

We learned that LLMs are surprisingly vulnerable to simple text-based attacks and that a lightweight rule-based security layer can catch a large proportion of common attack patterns before they ever reach the model. We also learned how important it is to treat AI systems with the same security mindset as traditional software infrastructure.

What's next for GuardianLM

We plan to replace the rule-based layer with a trained ML classifier for more nuanced threat detection, integrate with Elasticsearch and Kibana for advanced log analytics and visualization, add rate limiting and user authentication, and extend support to protect multiple AI models simultaneously behind a single security gateway.

Built With

Share this project:

Updates