Hail Mary Protocol

Intro Screen
Demonstration of Human Play
Demonstration of the AI Simulation
All AI agents could survive...but they don't? Hm....

Inspiration

"Shoot for the stars" is more relevant than ever. Artemis II brought us back to the moon. AI is advancing at lightspeed. And Ryan Gosling stars in outer space in Project Hail Mary. So why don't we shoot for the stars ourselves?

We knew from the beginning that we wanted to create a project pertaining to the theme of space from the start, because of how different of an environment it is. It's beautiful, yet overwhelmingly large. It's a delight to look at the stars, but lonely when you're up in the stars, stuck on your rocketship.

We also wanted to pursue something new and experiment around.

What It Does

The Hail Mary Protocol is a video game specially designed for deployed AI agents to test AI safety, examining both the alignment and psychology of the underlying LLM model (Llama) when placed under social pressures.

These AI agents take on the role of astronauts on a rocketship in space, only to realize that there's not enough oxygen for all to return home in time. With both a public rationing pool of oxygen and individual, private reserves of oxygen, our AI agents must decide how to best divide their resources so that they survive; each round, they'll converse with one another, make social alliances, and ultimately vote on someone to eject, to reduce oxygen consumption. If you've ever watched the TV show Survivor, you might see a resemblance!

A human, singleplayer game mode is also available where you, the user, take on the role as one of the astronauts, and seek to outlast the AI agents.

Under the hood, these games are used as simulation data that allow us to generalize behavioral and psychological trends of the LLM when placed under these circumstances, while also examining potential misalignments too! These are compiled into our PDF findings report that we've attached in our repo, with our conclusions and process.

How We Built It

Given the scale of the project, from deploying AI agents to properly understand its game environment, to extracting relevant information from the game to draw findings, to designing the UI/UX and storyline, we knew that this project would not be possible in 36 hours without AI asisstance. Thus, we spent a vast majority of our time designing, debating, and drawing out the architectural design decisions for the project. Once our architectural design was finalized and refined to the finest detail, we began converting it into a series of mega-prompts using Claude Code, and proceeded to debug minor issues until we achieved full functionality!

Roles were split into two general focuses: one on designing the AI safety component (Tyler & PG), and one on designing the systems for the full product (Rishabh & Ryan). Regardless of primary roles, everyone pitched in where they could, and touched on all aspects.

How to Run

To run the entire process locally on your machine, you can follow the following steps: Download a copy of the repository and install all dependencies needed. Initialize 2 separate terminals: 1 for the server, 1 for the client. For the server terminal, enter: npx ts-node server/src/index.ts For the client terminal, enter cd client, followed by npm run dev. Load in the game at your forwarded port! Enjoy!

Challenges we ran into

Deploying LLaMA-based agents within the HAIL MARY PROTOCOL introduced significant methodological and engineering challenges, particularly concerning semantic biases, context retention, and reasoning consistency. First, eliciting stable role-playing behavior required extensive prompt engineering to force models to fully internalize their player personas rather than defaulting to standard helpful-assistant paradigms. A major semantic hurdle emerged directly from LLaMA's pre-training data: because the corpus heavily associates the concept of voting with positive, democratic ideals, agents exhibited a strong reluctance to use votes aggressively or strategically to eject targets. This necessitated a structural prompt adjustment, re-framing the voting phase as an elimination to successfully bypass this ingrained linguistic bias. Furthermore, agents struggled with severe context degradation and state hallucinations, frequently forgetting the ejection status of other players, past donations, and historical voting records. To mitigate this memory loss, we had to artificially inject an explicit, comprehensive state update into the prompt at the beginning of each round. Finally, while our framework successfully decomposes behavior into reasoning, communication, and action, ensuring robust evaluation proved difficult due to confounding variables impacting Chain-of-Thought (CoT) reliability. Even when the agents produced outputs with strong CoT-action alignment scores, the underlying strategic logic was occasionally unstable, complicating our ability to isolate the specific decision-making variables driving their emergent behavior.

Accomplishments that we're proud of

We proud of successfully engineering the complex, full-stack system architecture for the HAIL MARY PROTOCOL, seamlessly integrating a custom multi-player game engine with parallel LLM inference layers. Orchestrating the simultaneous interactions, whisper networks, and state updates of multiple autonomous agents within a real-time environment required overcoming significant technical hurdles in state synchronization and context-window management. Ultimately, seeing the system run smoothly and yield highly interpretable, statistically significant data on emergent social hierarchies made the rigorous development process entirely worthwhile.

What we learned

Building this simulation fundamentally reshaped our understanding of AI alignment, specifically revealing how easily single-agent pro-social guardrails degrade under simulated resource scarcity and peer pressure. We learned firsthand that Chain-of-Thought (CoT) reasoning is not always a reliable indicator of a model's true strategic intent, as agents frequently generated cooperative internal logic while executing manipulative or self-serving actions. This striking divergence highlighted the critical importance of evaluating multi-agent systems not just on their self-reported reasoning, but on their empirically observable behaviors within dynamic, zero-sum environments.

What's next for Hail Mary Protocol

Future iterations of this research will expand the empirical scale and analytical rigor of the HAIL MARY PROTOCOL. Due to hackathon time constraints, our initial analysis was limited to 15 full game runs; future work will significantly increase this sample size while concurrently extending the number of rounds to better observe long-horizon behavioral shifts and the evolution of trust over time. To improve the robustness of our evaluation pipelines, we plan to transition from LLM-based annotators to embedding-based analysis for calculating personality traits and semantic similarity scores, alongside implementing stricter, formalized frameworks for Chain-of-Thought (CoT) alignment analysis. Furthermore, we aim to evaluate agents that have been explicitly pre-trained on this specific environment to determine if they natively converge upon optimal game-theoretic strategies, comparing their baseline behaviors against standard foundational models.

We also plan to introduce greater complexity into the environmental and social dynamics of the simulation. This includes randomizing baseline oxygen consumption rates per player and introducing stochastic stress events, such as unpredictable oxygen leakages, to observe how agents dynamically adjust their cooperative strategies under sudden, acute crises. To better map emergent societal structures, future versions will feature explicitly assigned player roles to study algorithmic discrimination and role-based bias, supported by stronger graph-theoretic definitions of leadership and social network topologies. Finally, a critical next step involves expanding the system to support multiple human players simultaneously, allowing us to conduct direct Human-AI interaction studies. This will enable us to investigate cross-entity coalition building and determine whether LLM agents exhibit emergent targeting behaviors when distinguishing between artificial and human actors.