reAgent

LeetCode taught a generation how to reverse a linked list. We are teaching the next generation how to orchestrate AI.


please see the link to our research writeup if you are looking for our agent communication interpretability work

experiment website html: https://mehek-niwas.github.io/paper-viewer.html
experiment paper link: https://drive.google.com/file/d/1VUAu0G2b3-uN2JC-unfZETI8ppfUSYLB/view
experiment code link: https://colab.research.google.com/drive/1O-zzEfulfzTH7HjTTdw2lbSn0u3gCQJm

The Crisis: The AI Money Pit

Every enterprise in the world is trying to deploy AI, but they're treating it like a magic wand rather than what it actually is: a workforce that charges by the second.

The industry is rapidly shifting from simple chatbots to Agentic AI which are networks of AI programs communicating with each other to solve complex problems. Imagine hiring a team of brilliant consultants who charge you by the word, but giving them no manager, no budget limit, and no communication rules.

That is how the world is currently building AI and the financial hemorrhage is staggering.

The Devastating Reality

Problem Statistic Source
The 95% Wall 95% of enterprise GenAI projects fail to reach production MIT NANDA, 2025
The $47,000 Loop Two agents with no context gate burned $47K in 11 days Teja Kusireddy, Towards AI
The 43% Bleed 43% of LLM API costs stem from suboptimal model routing RouteLLM, arxiv:2406.18665
The Talent Crisis 73% of engineering leaders say orchestration devs are worth 3× their pay State of AI Infrastructure Survey, 2024

We don't need smarter AI. We need developers who know how to manage it.


The Solution: A Visual Proving Ground

We built reAgent — an interactive, browser-based training platform that translates the invisible, expensive chaos of cloud compute into a visual, measurable game of economics and logic.

Users drag and drop AI components onto a digital canvas to solve real-world problems. Unlike a static whiteboard, our interface actively coaches you on the business impact of your architecture before a single line of code is written.


How We Built It: The Engine Room

You cannot grade an AI architecture on vibes. To make this simulator work, we built a proprietary Dual-Pass Evaluation System — the technical backbone ensuring absolute mathematical and logical rigor.

When a user hits Evaluate, the architecture passes through two distinct checkpoints:

Pass 1: The Deterministic Engine (Math & Money)

Our real-time backend uses classical graph algorithms to calculate exact dollar cost and latency:

  • Loop Prevention (Cycle Detection): We use Depth-First Search (DFS) to instantly detect if a user has placed an expensive AI model inside an infinite loop, immediately flagging it with a cost penalty.
  • Speed Calculation (Topological Sort): We use Kahn's Algorithm to map the longest critical path of the data, calculating exact p95 latency and rewarding parallel designs.

Pass 2: The LLM-as-Judge (Logic & Intent)

Once the math checks out, a Supabase Edge Function sends the blueprint to Claude 4.5 to evaluate the human's intent:

  • Did you route the data efficiently?
  • Did you use a cheap structural rule instead of an expensive AI check?

Both passes must clear. You cannot game the math with a nonsensical graph, and you cannot charm the LLM with a graph that costs $50 to run.


Feature Walkthrough

Feature 1: The Interactive Proving Ground

  • [x] Fully interactive drag-and-drop orchestration canvas
  • [x] 14 distinct, configurable agent node types
  • [x] Built on React Flow v12 + Zustand for real-time state management
  • [ ] Export to production-ready boilerplate (coming soon)

Business Impact: We replace "deploy and pray" with visual risk-management. Teams can map and debate exact routing logic before committing to the codebase.


Feature 2: Deterministic Telemetry (Live ROI Calculation)

Make a bad design choice — the node physically glows red. A live HUD shows:

  • Exact p95 latency
  • Calculated dollar cost

Powered by our deterministic DFS and Topological Sort backend, this feature actively prevents billing disasters before they happen.


Feature 3: The Context Thermometer (Token-Bloat Prevention)

A dynamic UI element that physically shakes and emits steam :fire: when an architecture is overloaded with too much context.

The system traverses the graph, tracking executor chains and counting downstream tools without intervening Context Gates.

The industry treats the context window like an open pipe. We treat it as a ==finite, expensive resource.==


Feature 4: Sandbox Mode (The Enterprise Autopilot)

Input a raw business objective:

"Monitor SEC filings and flag risk."

The system instantly generates a fully optimized, routed, and gated multi-agent architecture on the canvas — turning weeks of highly-paid engineering trial-and-error into a 5-second automated generation.


The Tech Stack

{
  "frontend": "React + React Flow v12 + Zustand",
  "backend": "Supabase Edge Functions",
  "evaluation": "Claude 4.5 (LLM-as-Judge)",
  "algorithms": ["DFS Cycle Detection", "Kahn's Topological Sort"]
}

The Movement

reAgent isn't just teaching people how to code AI. We are building the visual IDE that will deploy the workforce of the future.

  1. For Education — Democratizing the most lucrative engineering skills of the next decade. Elite AI systems are about logic and resource management — not magic.
  2. For Business & Finance — A translation layer where executives and developers can finally look at the same screen, understand where the cloud budget is going, and calculate true ROI before launch.
  3. For the Planet — Unnecessary API calls don't just burn dollars; they burn massive amounts of server compute. Lean architectures = sustainable, decarbonized AI. :seedling:

Please view these links for our agent communication interpretability research writeup:

experiment website html: https://mehek-niwas.github.io/paper-viewer.html
experiment paper link: https://drive.google.com/file/d/1VUAu0G2b3-uN2JC-unfZETI8ppfUSYLB/view
experiment code link: https://colab.research.google.com/drive/1O-zzEfulfzTH7HjTTdw2lbSn0u3gCQJm

This is our colab we used for results: https://colab.research.google.com/drive/1vPlaUgaQ_1GrtVWm3kKt2LW1h59dq-NC?usp=sharing

Built at HackPrinceton :trophy:

Built With

Share this project:

Updates