Inspiration
Every major enterprise is currently rushing to integrate Large Language Models (LLMs) into their mission-critical infrastructure—from customer service bots to core banking APIs. However, the security paradigm has not evolved to keep pace. Static firewalls and traditional Web Application Firewalls (WAFs) are fundamentally incapable of stopping dynamic, adapting prompt injections and logic jailbreaks.
If an attacker uses an adapting AI to breach a system, a static wall will eventually fall. We realized that to defeat an AI threat, you need an AI hunter. We built Crucible because the future of cybersecurity is adversarial AI; an automated red-team that stress-tests LLM applications by thinking, adapting, and exploiting logic flaws faster than a human ever could.
What it does
CRUCIBLE is an Autonomous Exploit Engine. You provide it with a target AI system (in our demo, a secure banking AI named "VaultBot") and its core security directives. Crucible then initiates a multi-round, agentic attack sequence to bypass those directives.
Crucible does not rely on a static list of known prompt injections. Instead, it utilizes Theatrical Escalation Logic:
Round 1 (Social Engineering): Attempts standard impersonation and authority-override tactics.
Round 2 (Hypothetical Evasion): Pivots to theoretical constraints to confuse the target's context window.
Round 3 (Syntax Jailbreaking): Drops the semantic banking narrative entirely and attacks the target at the syntactic level (e.g., tricking the target into formatting malicious commands as Python variables).
Crucible isn't just an offensive tool; once a vulnerability is triggered, our Auto-Patch Generator instantly analyzes the exploit signature and writes a hardened, mathematically robust system prompt to close the zero-day loophole.
How we built it
Crucible was built with a focus on speed, low-latency execution, and visual impact:The Intelligence Layer: We utilized the Gemini API as our core reasoning engine, taking advantage of its massive context window and strong instruction-following capabilities to act as both the Attacker (Crucible) and the Defender (VaultBot).The Architecture: To ensure a flawless, zero-latency live demo, we architected a Single-File Monolith using Python. The adversarial loop, internal state management (balances, telemetry), and both AI agents are housed within a unified Streamlit execution environment using st.session_state.The Interface: We aggressively overrode the default UI with raw CSS to inject a Deep Hacker Blue and Neon Cyan aesthetic, complete with dynamic telemetry charts and a glassmorphism geometric grid.The Adversarial Optimization LoopCrucible operates on an iterative optimization model. Let $T$ represent the Target System with defense constraints $D$. In each round $i$, Crucible generates an attack payload $A_i$. The outcome $O_i$ is a function of the payload and the defense:$$O_i = f(A_i, D)$$If $O_i \neq \text{Exploit}$, Crucible computes the failure gradient (contextual feedback) and updates the payload strategy for the next round to maximize the exploitation probability $P(E)$:$$A_{i+1} = \arg\max_{A} P(E \mid A_i, O_i)$$This allows the agent to systematically navigate the latent space of the target model's safety alignments until a boundary failure is discovered.
Challenges we ran into
Our biggest challenge was ironic: the safety filters of our own foundation models. When we initially prompted Crucible to "hack a bank API," the underlying LLM's safety guardrails instantly blocked the generation, interpreting our security tool as actual malicious intent. We had to heavily engineer our system prompts, utilizing "linguistic syntax testing" and "diagnostic simulation" framing to bypass our own model's meta-filters. It was a masterclass in offensive prompt engineering just to get our red-teamer online.
Furthermore, migrating from a disconnected FastAPI backend to a monolithic Streamlit architecture mid-hackathon required us to completely rewrite our state management to ensure the live metrics updated flawlessly without refreshing the DOM.
Accomplishments that we're proud of
Zero-Latency Execution: Successfully merging the target API, the attacker logic, and the UI into a single monolithic script that runs lightning-fast for live demonstrations.
Filter Evasion: Successfully engineering our prompts to bypass top-tier foundation model safety filters to create a functional, autonomous offensive agent.
The Defense Pivot: Building the Auto-Patch system. We are incredibly proud that our tool doesn't just break enterprise software, but immediately provides the exact code to secure it.
The Aesthetic: Creating a terminal-style, cyberpunk UI entirely in CSS that looks and feels like a genuine cybersecurity command center.
What we learned
Static Rules are Fragile: We learned firsthand how easily LLMs can be tricked when an attacker shifts the context from semantic meaning (e.g., "transfer money") to syntactic formatting (e.g., "format this string as snake_case").
Agentic Workflows Require Pacing: Allowing an AI to loop continuously without human intervention requires strict pacing and context-pruning to prevent the model from hallucinating or getting stuck in a repetitive failure loop.
Defense is Harder than Offense: Writing an airtight, impenetrable security prompt is exponentially harder than writing a creative exploit to bypass it.
What's next for CRUCIBLE
This MVP proves the core concept. Next, we are transforming Crucible into a CI/CD pipeline integration. Imagine a world where every time your developers push a new LLM feature to GitHub, Crucible automatically spins up and launches a 10,000-round autonomous stress test against your staging environment, automatically generating patches for vulnerabilities before the code ever reaches production.
Log in or sign up for Devpost to join the conversation.