Arrr-guMentor

Here, users can input context and the stances of each side.
Here, users can view a detailed analysis of a debate.
Here, users can call into an AI debate via phone.
MY GOAT

Inspiration

Our team members LOVE the show Suits (in total, we have around 400 episodes watched!) and thus have grown a fascination with the courtroom. Maybe we should have studied pre-law?

All jokes aside, when researching this topic, we discovered that debate and debate tournaments are a relatively untouched domain in AI. We also noticed that with a high disparity in debate resources and coaching between different groups, there is a lot of room to improve the equity in the debate world.

Machines have long surpassed humans in games such as Chess (DeepBlue, Stockfish, AlphaZero) and Go (AlphaGo), so with the release of DeepSeek R1, which showed its human-like CoT reasoning, we thought "why are we trying to make AI think like humans, when they have the ability to think like computers?"

So we created this engine to simulate decision trees, and calculate the best course of action in any given situation.

And what did we get? A tool that will train you to debate better than the greats like Lincoln, Aristotle, and Harvey Specter.

What it does

Our tool has two modes: Debate and Debate Analysis.

In the former, the user can have a live debate with our AI agent. The debate can either be with text, or over the phone, which will also take into account intonation and clarity in addition to the words said. Once you're done debating, you can view a evaluation score from -1 to 1 that indicates which side is favored to win, and by how much.

In Debate Analysis mode, the user inputs the context, and both stances on the matter. Then Arrr-guMentor generates hundreds of potential future debate paths, monte carlo sims the entire tree of possibilities, and analyzes every possible point so that you know exactly what your opponent will say and how to squash their argument just like a computer, or Mike Ross, would.

How it works

We built a novel multi agent consequence reasoning engine that uses Two Llama3.3 70B agents to model the debate, one acting as 'you' (stance 1) and while the second acts as the 'other' (stance 2), both choosing the best possible arguments to make.

Then, A third model judges all the debate trees, assigning a score to each message in the context of supporting their stance, and making valid arguments. These scores are used in a Monte Carlo simulation to generate the expected value for each message.

This tree is then passed to the frontend, where in Debate Analysis the user is able to study the tree to prepare for future debates, while in Debate, the tree is used by the agent to form its arguments.

How we built it

Arrr-guMentor was built with a wide variety of technologies, listed below:

Frontend

Tailwind CSS (Styling)
Node.js (Frontend web server)
React (Web components)

Backend

Flask (Backend web server for Python)
OpenAI Whisper (Text-To-Speech)
Retell AI (AI Speech Agent)
Meta Llama (AI for argument generation; tuned for this application)
Together AI (AI Inference Provider)

Challenges we ran into

The main challenge we ran into were the long times involved in making a lot of API calls. Initially, we took around 3 minutes to generate a tree with depth 5 and breadth 3 (that is, a tree with 364 nodes of different potential arguments and counterarguments). At the end, we were able to get that time down to below 15 seconds, by further tuning our AI model, batching API calls, and cycling through API keys.

We also ran into challenges with frontend, since no one in our group has extensive experience in creating a frontend. However, with two members dedicated to building the frontend, we successfully built a user interface from the ground up.

Accomplishments that we're proud of

Our user interface is accessible and has extensive functionality, allowing all users to use our tool with fluency. Not to mention, we literally have functionality for users to debate an AI agent via voice!

What we learned

Though together.AI (which we use to host our AI model online) is a great service, we lost a few hours trying to reduce the time needed to get a response. Next time, we should learn when to drop a technology and opt for a different solution.

Additionally, we learned the effectiveness of git branching. Through our coordination using the git version control system, we managed to minimize merge conflicts, leading to a seamless developer experience.

What's next for Arrr-guMentor

So far, we have designed a cohesive and accessible user interface, deployed multiple AI agents online, and thought deeply concerning our relationship with diversity, equity, and inclusion. Still, we have a few points to improve on:

Getting our application ready for deployment

However, our app is not deployable as is, as it is neither cost-efficient (we prioritized iterative development over our wallets, spending around $15 per contributor) nor scalable.

For example, for a single service, we alternate between different API keys in a rudimentary way to bypass a rate limit, while the proper solution would be to upgrade to a higher tier (please sponsor us @inference providers).

Low effort, high impact further steps

As a debate tournament preparation platform, Arrr-guMentor should consider elements of a debate tournament apart from the discussions. Thus, the platform will be expanded to provide guidance on:

talking speed,
note-taking,
sportsmanship, and so on.

Ambitious further steps

In the future, we may allow the user to converse/debate with a (real or imaginary) human figure with a high level of photorealism and inclusion of non-verbal cues, for example lip movements, body language, and facial expressions. And although not possible with current hardware, we plan to add smart-glasses integration so that Arrr-guMentor can help you in real-time at debate competitions with an AR hud and in-ear guidance.