🚢 chAIn — A multi-agent resilience engine for global logistics
You can try it out here: https://chain-production-413f.up.railway.app/analytics.html
💡 Inspiration
Inspired by Andrej Karpathy's LLM council and the ambition to build a world model capable of handling chaos events for AGI.
⚙️ What it does
The 2021 Suez Canal Blockage incident cost global logistics $540M in 6 days. Imagine using chAIn—a multi-agent “council” that debates user queries in real time to produce more reliable answers. chAIn would have predicted this using NVIDIA Cosmos physics simulation and deployed mitigation within seconds, saving millions in costs.
🏗️ How we built it
- LangGraph orchestrates a cyclical workflow of four specialized agents (Observer, Architect, Skeptical, Strategist). Each agent is implemented as a node in the LangGraph, and the graph cycles until consensus is reached.
- CouncilState is a TypedDict that acts as the single source of truth, passed between agents. It stores the full debate trace, real-time world data, simulation results, and the final intervention plan.
- The Sensor agent uses MCP Clients to fetch real-time data from external APIs (Weather, Maritime, Traffic), which is then injected into the shared state.
- NVIDIA Cosmos is called by the Forecaster agent to generate physics-based predictions of future supply chain states. This is done via an AWS SageMaker endpoint, and the results are stored in the shared state for downstream agents.
- Action Tools are exposed as internal APIs that the Strategist agent can call to trigger real-world actions (e.g., rerouting shipments, hedging, or escalating issues).
- arize is used for LLM observability. Every agent decision, message, and tool call is traced using OpenTelemetry, providing a full audit trail for compliance and debugging.
- TRAE helped rapidly build the LangGraph agent workflow, CouncilState schema, and integration logic for MCP clients, NVIDIA Cosmos, and arize. TRAE also assisted in creating FastAPI endpoints and React/Next.js components, accelerating development and ensuring best practices.
- FastAPI exposes endpoints for the frontend to interact with the multi-agent council, including a streaming API for real-time updates and serving the frontend static files.
- Figma was used for wireframes and UI was built with Next.js/React
🧩 Challenges we ran into
- Many NVIDIA models are large (≈7B parameters) and costly for a 24-hour hackathon. The models are optimized for physics + multimodal (images/video) reasoning, not lightweight text-to-text interactions. We chose NVIDIA’s smallest text-to-text model, nvidia-reason-1-7b, and deployed it on AWS.
- With 4 agents cycling through a graph, it was hard to explain why a decision happened. Standard logs turned into walls of text, making it difficult to trace how one agent’s output affected another or where the reasoning chain broke. arize gave end-to-end observability: full agent traces showing which node ran, prompts sent, LLM outputs, and how state changed step-by-step.
🏆 Accomplishments that we're proud of
- Deployed NVIDIA Cosmos World Foundation Model on AWS SageMaker and integrated real physics-aware forecasts up to 72 hours ahead into our pipeline (NOT A MOCK).
- Built a 4-agent LangGraph council (Sensor, Strategist, Executor, Auditor) with a cyclical state graph so agents debate and iterate until consensus.
- Cut supply-chain Time to Action from ~48 hours to ~5 minutes (~300 seconds): ingest signal → simulate → debate → dispatch actions.
- Implemented production-grade MCP: clients ingest live weather/maritime/traffic feeds, and an MCP server exposes enterprise tools (e.g., trigger_reroute, create_purchase_order, hedge_position).
- Added risk-based human-in-the-loop controls: high-stakes decisions require UI approval, while low-risk optimizations run autonomously.
- Delivered an end-to-end system in one hackathon: streaming FastAPI backend, real-time frontend, Dockerized deployment, and Railway hosting.
- Many tools were new to the team, but we still delivered a coherent, real-world solution.
📚 What we learned
Aside from the technical knowledge and many new tech stacks, we also learned that teamwork and learning new tools fast matter the most. We constantly jumped in to help each other, debugging, filling gaps, and unblocking problems, so we could ship as quickly as possible.
🔮 What's next for chAIn — A multi-agent resilience engine for global logistics
- Add more real-world feeds (AIS vessel data, port congestion, carrier schedules) to improve detection and context.
- Turn our demo actions into real integrations (TMS/ERP) via MCP so chAIn can trigger workflows in production.
- Add reliability layers: automated evals + guardrails + role-based human approval for high-stakes decisions.
Log in or sign up for Devpost to join the conversation.