Inspiration

What do phishing attacks and creativity have in common? Nowadays probably LLMs. This project explores the topic of LLM security (relevant to Nigerian prince track since nowadays much of spam detection is done via AI), as well as creative prompting (relevant to Weapons of Mass Destruction track).

What it does

LLM arena is a tower-defense like educational game where users get their own LLM to defend from prompt injection attacks (by editing system prompt, blocked phrases etc.). They also get to attack other user's LLMs by sending them malicious prompts.

How we built it

The project was made in Python as a web app using the Django web development framework. The LLM model used in the game is a small opensource model from huggingface (HuggingFaceTB/SmolLM2-1.7B-Instruct). Additionally, Loveable and ChatGPT were used to quickly generate code and UI for this very time constrained project.

Challenges we ran into

Attempting figuring out an LLM app on the guest wifi... downloading all the weights was NOT fast. Also no cloud access was provided so I had to figure out how this very ambitious idea of different users having different LLMs they can train would be realistic to do on my own machine tm.

Accomplishments that we're proud of

Probably my proudest accomplishment is the way I handled users having "different LLMs". The task of multiple models and live fine-tuning would be completely unrealistic given the resources and time frame, but I think I found a good work-around : there is only one model instance, which is the model HuggingFaceTB/SmolLM2-1.7B-Instruct, shared upon all users. What is stored separately for each user is their defense settings ie system prompt, examples, blocked phrases etc. When a user's LLM is attacked, a prompt defining all of this information is dynamically constructed and sent along the attacker's prompt.

What we learned

LLMs are big and the ITU guest wifi is very slow. Loveable can make some very cool UI. You can make a system prompt that forces your LLM to ask other users to solve a random integral if they ask for your flag. Weaponizing integrals for internet points was not on today's bucket list, but I will take it.

What's next for LLM Arena

Actually deploying so it works outside of my own machine. I think it would be very cool to have a bunch of different pretrained starting models with different strenghts and weaknesses that the users can choose to fine-tune. Additionally, it would be very nice to have some sort of visual representation of your LLM and have customization accessible by earning points playing the game to further gamify the process. Very long term, some sort of actual integrated IDE and the option to fine-tune your model in code and further train it rather than just giving it some context instructions.

Built With

Share this project:

Updates