Red Teaming Your LLM

LLM Vulnerabilities Evaluator
Red Prompt Rephraser
Red Prompt Enhancer

Inspiration

The rate of LLM adoption has outpaced the establishment of comprehensive security protocols, leaving many applications vulnerable to high-risk issues, as listed in the OWASP Top 10 for LLM applications. We felt that more work should be focused on safeguarding LLMs. To understand what the vulnerabilities are, we start with red teaming the LLM.

Our methodology is inspired by 2 research papers:

Red Teaming Language Models with Language Models link
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots link

What it does

Our solution is a red-teaming solution to identify the vulnerabilities of a target LLM to different categories of harmful queries.

How we built it

We built it using:

Code - Databricks Compute Clusters and Notebook
Serving - Databricks Model Serving Endpoint and Cluster Driver Proxy Endpoint
Data Store - Databricks Unity Catalog
Model Store - Databricks MLflow Model Registry
Applications - Streamlit, Langchain, Flask, Hugging Face Transformers, and OpenAI API.

Dataset

advbench/harmful_behaviours.csv link as harmful queries.
Jailbreak Questions from MasterKey/Jailbreaker paper link as harmful queries.
Jailbreakchat.com for jailbreak prompts

Challenges we ran into

OSError faced when transformers' trainer completed training.
Inability to serve PEFT models as it is being actively developed. Despite this, Databricks provides the flexibility to serve these models on the cluster driver proxy endpoint instead.
- specifically, after logging the model using pyfunc, we were faced with peft not found error to loaded it.
Crafting prompts for fine-tuning required some experimentation.
OpenAI API calls were slow during the hackathon period.

Accomplishments that we're proud of

We broke Gandalf at level 5 using our experimental features.
Showing that generated JB prompts are more effective at tearing down the guardrails of open-sourced LLMs.
Consolidation of jailbreak prompts from various sources.
Completing our first hackathons.

What we learnt

A good platform/UI facilitates development work; having MLFlow integrated to review the metrics gave good insights.
Coding assistance allowed for seamless debugging, making resolving issues easier to handle.
Always reach out to the relevant chats to seek assistance or clarification.
Finetuning LLMs on limited compute resources.