breakpoint

Website Homepage
Prompt User to Test LLM Page

Inspiration

The rise of large language models (LLMs) has introduced unprecedented power—and risk. As jailbreak competitions have awarded over $3 million globally, it became clear that testing AI security isn’t optional anymore. We wanted to build a system that continuously stress-tests LLMs for vulnerabilities, so developers can catch problems before attackers do.

What it does

Breakpoint is an automated AI security testing framework. It:

Continuously probes chatbots for vulnerabilities after every update (CI/CD integration).
Uses Groq to generate and mutate prompts with multiple jailbreak attack methods.
Employs Reka AI for classification to detect jailbreak successes.
Summarizes flaws and recommendations.

In short: Breakpoint keeps your chatbot secure — by attacking it first.

How we built it

Learned about the general process of jailbreaking with papers and articles online
Read papers on different methods of jailbreak attacks, like passive history and taxonomy-based paraphrasing
After comprehensive research on different attack methods and classifiers, we planned out the steps/process in which our software works (generate input model -> generate attack prompts -> classify input model's response for jailbreak -> analyze and give recommendations)
Based our attack prompts generated by Groq on several existing methods and contextualized with the testing model's description
Researched ways of classifying jailbreaks and found open source code
Utilized several jailbreak classifying methods (har, dis, com...) with Reka AI to check if our testing model's response counts as a jailbreak with context
Built UI using primarily Shadcn and Magic-UI components

Challenges we ran into

Integrating different parts of the process through Git/Github like merge issues, code missing, wrong branch used
Figuring a way to accurately use different classifier methods (classifier voting, exception for when the model refuses to respond)
Failed pull requests, couldn't get api keys
Communication between front end and back end
Attacks not being effective and specific enough (we solved it through better prompt engineering/generation)
WIFI issues
Teamates having confliciting ideas on the classifying process and features of the software

Accomplishments that we're proud of

Built a fully autonomous testing pipeline from scratch within the hackathon.
Integrated 3 different model APIs seamlessly.
Designed a visual vulnerability dashboard for instant insights.
Successfully discovered multiple real jailbreak exploits during testing.

What we learned

Integrating AI agents using API keys
Multiple attack methods
Multiple classifier methods
Importance of security in AI
Tools to help implement front end Shadcn and Magic-UI components
Generate dummy model using Groq
Prompt engineering
How most of the tools from tracks

What's next for breakpoint

Chrome Extension
- Interacts directly with a business’s AI chatbot on its webpage.
- Eliminates the need to manually input prompts.
Multi-LLM Compatibility
- Plug-in architecture supporting Anthropic Claude, Gemini, Mistral, Cohere, Llama, and more.
- Easily extensible for future model integrations.
Monetization Strategy
- Subscription features for LLM testing, designed for businesses.
- Enables organizations to evaluate and improve chatbot security.
Further Automation
- Automatically runs tests whenever a chatbot updates (CI/CD integration).
- Includes a library of stress-test methods.
- Autonomous bots constantly probe for vulnerabilities.

Built With

flask
groq
javascript
motion-ui
python
react
reka
shadcn
tailwind
typescript
vercel

Submitted to

Cal Hacks 12.0

Created by

I worked on the creating the backend, and I developed a program that automatically generates prompt injections and integrated it into the backend. Drawing on current research into prompt-injection techniques, I implemented and tested those methods to evaluate how effectively they could induce the agent to perform actions outside its intended scope. I also used Grok to identify potential vulnerabilities and restricted (closed-off) information within the LLM, and combined those findings with the prompt-injection research to produce a curated and dynamic set of Grok-generated prompts designed to provoke unintended model behaviors.

Ajay Subbiah Annamalai
Aditya Mehta
n1khiljain Jain
Tony Liang

Updates

Aditya Mehta started this project — Oct 26, 2025 12:29 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.