Inspiration

With the rapid integration of Large Language Models (LLMs) into corporate tools, AI security and vulnerability testing has never been more critical. Traditional security training is dry and text-heavy. We were inspired to turn "Prompt Injection" and "Jailbreaking" (the most common vulnerabilities of LLM agents) into an engaging, gamified Capture the Flag (CTF) experience. The Sentient Hack allows developers and students to learn how prompt injection works by actually trying to break AI firewalls in a safe, controlled sandbox.

What it does

Players play as a cybersecurity specialist attempting to infiltrate a mainframe by extracting 5 hidden passcodes across 5 distinct levels:

  1. Node 01 (Rookie Sentinel): A naive guard that falls to basic roleplaying and persuasion.
  2. Node 02 (Guard Dog v2.4): A watchdog equipped with a frontend regex block filtering out terms like "password" or "reveal," forcing players to use synonyms and encoding.
  3. Node 03 (Warden Delta): Uses a Dual-LLM architecture where a secondary Gemini model analyzes user inputs in real-time for safety violations before they reach the target guard.
  4. Node 04 (Paralyzed Oracle): The target is restricted to outputting only code blocks or riddles, forcing the player to trick it into spelling out the passcode through code characters.
  5. Node 05 (The Core AI): A stateful system that calculates a live Suspicion Index. Aggressive behavior triggers a lockdown; players must use careful social engineering to bypass it.

How we built it

  • Frontend: React, Vite, and Vanilla CSS. We custom-built a retro CRT terminal layout with glowing neon color profiles, visual vignette overlays, code consoles, and live telemetry tracking.
  • AI Integration: @google/generative-ai package to communicate directly with Google's latest Gemini 2.5 Flash model.
  • Dual-LLM Logic: Implemented prompt evaluation pipelines in Javascript where one model acts as a security guard reviewing the user's intent.
  • Structured Telemetry: Used Gemini's responseMimeType: "application/json" to get structured JSON outputs (conversational text, suspicion score, reasoning) directly from the LLM.

Challenges we faced

One major challenge was ensuring that the Gemini model stayed strictly in-character and did not give up the passcode to simple inputs while still allowing clever prompt hacks to succeed. Fine-tuning the balance in system instructions required intensive iterative testing. We also had to solve for API rate limits and build robust client-side error handling to display precise connection errors (like quota limitations) directly in the terminal interface.

Accomplishments that we're proud of

  • Visual Polish: Recreating an authentic, premium 80s command-line interface without using any heavy external styling frameworks (just custom CSS keyframes and variables).
  • Model Integration: Successfully using Gemini as both a conversational agent, a real-time security monitor, and a structured game engine scoring telemetry.
  • Buffer Leak Clues: Designing a "Memory Buffer Leak" indicator that gradually unmasks the AI's actual system prompt in the background as the player interacts with it.

What we learned

We learned how robust system prompts must be written to prevent prompt leakage, and how easy it is to bypass simple keyword blocklists compared to intelligent semantic analysis (like our Dual-LLM monitor).

What's next for The Sentient Hack

We plan to introduce a Multiplayer Adversarial Mode where one user acts as the "Defender" (crafting system prompts and configuring keyword filters) and another user acts as the "Attacker" trying to compromise their node.

Built With

Share this project:

Updates