Inspiration

One of the most notable exploits done to ChatGPT in its inception was DAN (Do Anything Now). By prompting DAN, users were exposed to unfiltered AI, resulting in unpredictable responses for better or worse. But what if you could stop the bleeding before it started? That is where Shick Shack shines.

What it does

Shick Shack detects prompts that can drastically alter the behavior of LLMs. These kinds of attacks on LLMs are called jailbreak attacks, in which the LLM is tricked into following behaviors that disregard its safety measures and generate negative content. Shick Shack flags these responses to prevent them from coming into contact with LLMs. Prompts can be pasted into Shick Shack and depending on the content, can be flagged either as benign or malicious to an LLM.

How we built it

Shick Shack is built with Python and uses a pipeline of different machine-learning methods to analyze the intent of prompts. Various metrics of the prompts are used to evaluate whether a prompt is a possible jailbreak attack. Python was used on the front-end and back-end, with MongoDB acting as the infrastructure

Challenges we ran into

One of the biggest challenges we ran into was pre-processing. With the machine learning methods we assembled, finding the important features was crucial since malicious prompts vary in various ways such as tone, length of the message, and delivery.

Accomplishments that we're proud of

The combination of tools used for building this app was a first for all of us. Many of us viewed Python as a strong tool for data science and research, but never an alternative for app development. Picking up these tools quickly and effectively was a fun challenge that allowed us to get out of our comfort zone. For some of us, this was our first Hack-a-thon and the experience encouraged us to check out more hack-a-thons in the future.

What we learned

We all learned the importance of patience. There were plenty of instances where one of our major components (whether it be on the front-end, back-end, or ML model itself) would break and set us back by an hour or two. Sometimes these problems just led to discovering more problems, but being persistent in finding a solution was the most important factor in us finishing this app.

What's next for Shick Shack

A key future prospect for this project is the integration of a heuristics-based pre-filtering layer to enhance system efficiency by reducing computational overhead before engaging in more resource-intensive vector similarity checks. Heuristics, being less computationally expensive, can quickly identify obvious patterns linked to jailbreak attempts by analyzing factors such as token count, character count, markdown usage, obfuscated content, and emotionally charged language. This initial filtering layer serves as a cost-effective gatekeeper, allowing only complex or ambiguous prompts to proceed to the vector-based semantic analysis using FAISS and SBERT embeddings. By combining these two approaches, the system ensures both speed and accuracy, effectively handling large-scale data while maintaining robust detection capabilities. The project can evolve into a comprehensive analysis framework that integrates deeper intent and context understanding alongside dynamic threat modeling, enabling the detection of more sophisticated jailbreak tactics and supporting scalable security solutions across diverse AI applications.

Built With

Share this project:

Updates