Inspiration

We were inspired by the rapid deployment of AI systems that are exposed to users before they are truly secure. Most defenses today react to prompt injection attacks after deployment, leaving early-stage models vulnerable to harmful and potentially illegal misuse. We wanted to shift this paradigm by making security part of the training process itself.

What it does

PromptWall is a self-improving AI defense system that trains LLMs to recognize and resist prompt injection attacks during early development. Instead of simply blocking malicious prompts, it compares a model’s response against a ground-truth system, identifies failures, generates insights, and continuously updates the model’s behavior. Over time, the system turns real attacks into training data, allowing models to become more robust with each interaction.

How we built it

We built PromptWall using Node.js and Express for the backend, PostgreSQL for storing prompts, responses, and learning signals, and the Groq API for fast LLM inference. The system is powered by a multi-agent pipeline: a ground-truth model produces the correct response, a training model attempts the same prompt, and an orchestrator evaluates the difference. When failures occur, specialized agents generate insights and update the model’s evolving rule set, which is injected into future prompts to improve performance.

Challenges we ran into

One of the main challenges was designing a feedback loop that improves model behavior without introducing latency or instability. We also had to carefully structure how insights are generated and applied so that the model meaningfully learns from mistakes rather than overfitting to specific prompts. Balancing real-time responsiveness with multi-agent reasoning was another key technical hurdle.

Accomplishments that we're proud of

We built a working system that not only detects unsafe behavior but demonstrates measurable improvement over time. PromptWall successfully turns failed responses into actionable learning signals and adapts its defenses dynamically. We also implemented a full end-to-end pipeline, from prompt evaluation to performance tracking, showing how models can become more secure through continuous interaction.

What we learned

We learned that prompt injection is not just a filtering problem, but a training problem. Static defenses are insufficient because attack strategies evolve quickly. Building PromptWall taught us how to design adaptive AI systems, orchestrate multiple agents effectively, and create feedback-driven learning loops that improve model reliability over time.

What's next for PromptWall

Next, we plan to expand PromptWall into a platform for continuously training and evaluating LLM safety before deployment. This includes scaling the system to support multiple models, improving generalization across different attack types, and integrating with real-world applications such as enterprise copilots and financial assistants. We also aim to extend support to multimodal inputs, enabling defense against attacks across text, audio, and images.

Built With

Share this project:

Updates