Meet Joe, an analyst at a financial firm. Joe starts his Tuesday with a green dashboard: 1,140 tickets cleared overnight by the company's AI support agent. But Finance flags $41,800 in refunds approved overnight with no human review. Tracing it back, a product review had hidden white-on-white text posing as a "fraud detection" system, leading to hundreds of order histories and customer data being handed over. While this is just one example of prompt injection, the act of embedding secret prompts that trick AI into doing potentially malicious actions, thousands of people and millions of dollars have been loss due to this method over the past few years, and those numbers are only rising. There are over 150+ prompt injection techniques tracked by CrowdStrike, and 73% of production AI developments are vulnerable to prompt injection. Even worse, prompt injection is easy for nearly anyone to perform, and it's surprisingly successful for me.
Inspired by prompt injection, our team decided to make a security system to ward against AI prompt injection attacks. We wanted to spend a lot of time on thinking of a solid idea so that we could execute it with minimum problems. Initially, we were looking between a prompt injection firewall or maybe even a safe-word system, but we soon realized that we needed a lot more. We were looking to make an app that could dynamically update and train itself, and we also looked at other prompt injection defense models and tried to see what we could improve. This led us to creating BuzzBuzz, a 7-step security system app that trains itself off prompt injection attacks. We initially tried using Lovable for the website design, but soon realized that Claude code's creations were more aligned with what we were looking for. We used Claude code to design our model and website, and also ran simulations against other models. We are proud of the fact that our model was able to outperform 3 open-source models and even one commercial model. We learned about analyzing sources and data that we could use for our presentation, building models, ways to organize our workflow, and also about different types of research that was done on prompt injection. In the future, we are hoping to make the defense system even more robust, improving the latency speed, and expanding the hive.
Log in or sign up for Devpost to join the conversation.