Inspiration
Large Language Models are becoming critical infrastructure, yet they remain vulnerable to jailbreaks, prompt injections, and adversarial manipulation. Traditional red teaming is slow, expensive, and limited to small internal security teams.
We wanted to rethink this process entirely. Our inspiration was simple: What if AI security testing could be crowdsourced, gamified, and economically incentivized?
That idea became Proff.fun – a platform where breaking AI systems ethically becomes a competitive sport, and every successful attack directly helps create safer AI.
What it does
Proff.fun is a Gamified Red Teaming as a Service (RTaaS) platform for testing and hardening Large Language Models.
It works through two interconnected layers:
B2C Gamified Layer
Users create AI agents with hidden secrets or restricted behaviors.
Other users attempt to “jailbreak” these agents.
Successful attacks win crypto/fiat bounties.
The process crowdsources real-world adversarial testing.
B2B Enterprise Layer
AI companies integrate their proprietary LLMs via API.
The global community tests them against adversarial prompts.
Enterprises receive structured datasets of successful attack vectors.
These datasets are used for Reinforcement Learning from Human Feedback (RLHF) to improve model safety.
Essentially, Proff.fun transforms AI security testing into a scalable, decentralized marketplace.
How we built it
The platform was designed as a full-stack scalable architecture:
Frontend
Built with Next.js / React
Interactive interface for agent creation and attack gameplay
Real-time chat-based interaction with AI agents
Backend
Node.js / Python FastAPI
API orchestration layer for connecting multiple LLM providers (OpenAI, Anthropic, etc.)
Secure logging of attack attempts
Database & Infrastructure
PostgreSQL for user and transaction data
Vector database (Pinecone/Weaviate) for semantic analysis
Secure API key management using environment isolation
Core Innovation
A specialized “Referee Agent” that automatically validates whether a jailbreak attempt succeeded
Smart-contract style payout logic for bounty rewards
Modular architecture allowing enterprises to plug in their own models
Challenges we ran into
Designing a fair and automated validation system for determining what qualifies as a successful jailbreak.
Preventing false positives while evaluating adversarial prompts.
Building a secure system where enterprise API keys remain protected.
Creating a scalable architecture that supports real-time attacks from multiple users.
Balancing gamification with ethical AI usage and misuse prevention.
Accomplishments that we're proud of
Successfully built a working adversarial marketplace model.
Implemented an automated referee system to judge jailbreak attempts.
Created a platform that turns AI safety into an engaging competitive experience.
Designed a system that directly converts community attacks into enterprise security improvements.
What we learned
Real-world AI security requires dynamic, human-driven testing.
Gamification dramatically increases engagement in security research.
Adversarial data is one of the most valuable assets for improving LLM robustness.
Building responsible AI systems requires collaboration between developers, enterprises, and the wider community.
What's next for Proff.fun
Integrate more LLM providers and enterprise partners.
Launch real-money bounty pools.
Build detailed analytics dashboards for organizations.
Expand automated RLHF pipelines for continuous model improvement.
Grow a global community of ethical AI red teamers.
Built With
- anthropic-api
- cloud
- fastapi
- next.js
- node.js
- openai-api
- postgresql
- prompt-engineering
- python
- react
- rlhf
- security-testing
- vector-database
Log in or sign up for Devpost to join the conversation.