Inspiration

Large Language Models are becoming critical infrastructure, yet they remain vulnerable to jailbreaks, prompt injections, and adversarial manipulation. Traditional red teaming is slow, expensive, and limited to small internal security teams.

We wanted to rethink this process entirely. Our inspiration was simple: What if AI security testing could be crowdsourced, gamified, and economically incentivized?

That idea became Proff.fun – a platform where breaking AI systems ethically becomes a competitive sport, and every successful attack directly helps create safer AI.

What it does

Proff.fun is a Gamified Red Teaming as a Service (RTaaS) platform for testing and hardening Large Language Models.

It works through two interconnected layers:

B2C Gamified Layer

Users create AI agents with hidden secrets or restricted behaviors.

Other users attempt to “jailbreak” these agents.

Successful attacks win crypto/fiat bounties.

The process crowdsources real-world adversarial testing.

B2B Enterprise Layer

AI companies integrate their proprietary LLMs via API.

The global community tests them against adversarial prompts.

Enterprises receive structured datasets of successful attack vectors.

These datasets are used for Reinforcement Learning from Human Feedback (RLHF) to improve model safety.

Essentially, Proff.fun transforms AI security testing into a scalable, decentralized marketplace.

How we built it

The platform was designed as a full-stack scalable architecture:

Frontend

Built with Next.js / React

Interactive interface for agent creation and attack gameplay

Real-time chat-based interaction with AI agents

Backend

Node.js / Python FastAPI

API orchestration layer for connecting multiple LLM providers (OpenAI, Anthropic, etc.)

Secure logging of attack attempts

Database & Infrastructure

PostgreSQL for user and transaction data

Vector database (Pinecone/Weaviate) for semantic analysis

Secure API key management using environment isolation

Core Innovation

A specialized “Referee Agent” that automatically validates whether a jailbreak attempt succeeded

Smart-contract style payout logic for bounty rewards

Modular architecture allowing enterprises to plug in their own models

Challenges we ran into

Designing a fair and automated validation system for determining what qualifies as a successful jailbreak.

Preventing false positives while evaluating adversarial prompts.

Building a secure system where enterprise API keys remain protected.

Creating a scalable architecture that supports real-time attacks from multiple users.

Balancing gamification with ethical AI usage and misuse prevention.

Accomplishments that we're proud of

Successfully built a working adversarial marketplace model.

Implemented an automated referee system to judge jailbreak attempts.

Created a platform that turns AI safety into an engaging competitive experience.

Designed a system that directly converts community attacks into enterprise security improvements.

What we learned

Real-world AI security requires dynamic, human-driven testing.

Gamification dramatically increases engagement in security research.

Adversarial data is one of the most valuable assets for improving LLM robustness.

Building responsible AI systems requires collaboration between developers, enterprises, and the wider community.

What's next for Proff.fun

Integrate more LLM providers and enterprise partners.

Launch real-money bounty pools.

Build detailed analytics dashboards for organizations.

Expand automated RLHF pipelines for continuous model improvement.

Grow a global community of ethical AI red teamers.

Built With

Share this project:

Updates