Inspiration
I was inspired to create this project due to a personal newfound interest in cybersecurity, and how we can secure our software in the advent of AI/ML technology.
What it does
Honeypot - Fake digital assets or environments designed to attract cybercriminals. These assets could include software applications and data that act like a legitimate computer system, contain sensitive data, and aren't secure.
Honeypot the Bot! (HTB) casts the effective approach of honeypotting to continuously harden our AI chatbots against attackers. "Jailbreaking" chatbots is possible through malicious prompts, and we need to train our models to resist these attacks. Especially since every chatbot exploit is automatically a day-zero vulnerability.
HTB utilizes another AI model trained on a common set of malicious prompts AND prompts specifically targeted to your chatbot's specific business logic. HTB's AI model will monitor interactions between attackers and the AI chatbot running in a honeypot, and will identify prompts that are likely to be malicious. The security team will identify the malicious prompts, train their existing chatbots against those prompts, and then redeploy the chatbot in the honeypot and across other environments.
These steps are looped to allow for a continuous hardening of AI chatbots against the threat of jailbreaking. HTB can be easily integrated into any application through limited configuration and deployment as a container in your existing honeypot Kubernetes cluster. Through this solution, your company is...
- Continuously protected from the threat of jailbreaking
- Able to shift security left and can stop exploits before they're used in a production environment
- Safe from possible brand, reputation, and financial losses due to a jailbreak on your bot
How I built it
In it's current form--uses Python, ChatGPT's API, and LlamaIndex!
Challenges I ran into
Here are just some of the challenges I ran into...
- Prompt engineering to get ChatGPT to do what we need
- Obtaining data sets on malicious prompts
- Defining to ChatGPT the criteria with which to judge a prompt as malicious
Accomplishments that I'm proud of
Having a working demo of HTB's trained AI model!
What I learned
There is a lot more depth to how GPT can be used than I thought. I learned a lot about what indexing, embedding, and working with these models really means. And of course, I learned more about Docker!
What's next for Honeypot the Bot!
Here are the next steps!
- Gather more data sets of malicious prompts to train HTB's AI model
- Configure scraping of logs through an endpoint exposed on the target application
- Create a front-end monitoring page to view HTB's exported data, written in the React framework

Log in or sign up for Devpost to join the conversation.