Project Story

What Inspired Us

We wanted a realistic multi-agent negotiation simulator that mirrors real marketplaces. Most simulations use handcrafted scoring, which misses how humans actually negotiate. We built Arbitrage (Multi-Agent Marketplace Simulator) to explore whether LLMs can handle complex negotiations with true information asymmetry like real markets—buyers can't see seller margins, sellers can't see each other's offers—using pure LLM reasoning instead of formulae.

What We Learned

Multi-Agent Orchestration: Orchestrating 1 buyer vs. up to 10 sellers required careful state management. We used LangGraph to model the negotiation flow, with parallel seller responses handled via asyncio.gather() and semaphores to avoid overwhelming the LLM provider.

Information Asymmetry Enforcement: Enforcing strict visibility rules was challenging. We built a visibility filter that routes messages based on @mentions and ensures sellers never see each other's responses, with the orchestrator acting as a trusted intermediary.

Real-Time Streaming: Implementing Server-Sent Events (SSE) for live updates required careful async generator handling, connection management, and heartbeat mechanisms.

How We Built It

Tech Stack:

  • Backend: FastAPI with async/await for concurrent agent interactions
  • Frontend: Next.js with React for real-time visualization
  • Database: SQLite with SQLAlchemy ORM
  • Streaming: SSE for real-time message updates

Development Phases:

  1. LLM Provider Infrastructure: Built abstraction layer with streaming, retry logic, and error handling
  2. Agent Logic & Negotiation Graph: Implemented buyer/seller agents with distinct personalities and priority weights, orchestrated via LangGraph
  3. Database & Session Management: Designed schema for sessions, negotiations, messages, offers, and outcomes
  4. API & Real-Time Streaming: Built REST endpoints and SSE streaming for live negotiation updates

Key Decisions:

  • No handcrafted scoring—decisions emerge from LLM reasoning
  • Database-backed configs—configure once, run unlimited negotiations
  • Provider toggle for easy switching between local and cloud LLMs

Core Integrations:

Daytona - Secure Agent Execution Sandbox We used Daytona as the sandbox environment where all agents execute. This provides a safe, controlled, isolated environment for each agent (buyer and sellers) to run independently. Daytona's sandboxing ensures that agents can't interfere with each other or access unauthorized resources, creating a secure execution environment that mirrors production isolation. This was critical for running multiple agents concurrently without security risks or resource conflicts.

Tigris - Conversation Storage & Transcript Management We integrated Tigris to store all conversation transcripts and negotiation logs. Every negotiation session is automatically saved to Tigris, creating a complete audit trail of all agent interactions, messages, offers, and outcomes. This persistent storage enables replaying negotiations, analyzing patterns across sessions, and maintaining a comprehensive record of all marketplace activity. Tigris's object storage architecture makes it perfect for storing structured JSON logs of each negotiation episode.

Galileo - AI Quality Monitoring & Price Feedback Logging We leveraged Galileo for comprehensive logging and monitoring of all negotiations. Galileo tracks every LLM call, logs the negotiated prices, and provides real-time feedback on negotiation quality. It monitors coherence scores, hallucination rates, and decision quality metrics. Most importantly, Galileo logs the final negotiated prices and provides feedback on whether deals were successful, helping us understand agent performance and negotiation effectiveness. This observability layer is essential for understanding how agents behave and improving their strategies.

CodeRabbit - Code Review We used CodeRabbit to run automated code reviews on every major change, catching bugs, style issues, and potential performance bottlenecks before they reached production. CodeRabbit’s inline suggestions on pull requests helped us keep a consistent code style across the backend, frontend, and agent orchestration layers. It also flagged risky patterns around async usage and database access, which was especially important given our heavy use of concurrency and streaming. This let us iterate quickly in a 24-hour build while still maintaining a clean, maintainable codebase.

Additional Integrations:

  • Anthropic Claude:Primary LLM for agent reasoning via OpenRouter
  • CodeRabbit: Automated code reviews

Challenges We Faced

  1. Information Asymmetry Enforcement: Preventing information leakage between agents required strict routing rules and prompt engineering to prevent sellers from inferring hidden information.

  2. Concurrent Agent Responses: Handling parallel seller responses without overwhelming the provider required asyncio.Semaphore(10) and timeout mechanisms.

  3. Secure Agent Isolation: Ensuring agents run in isolated environments without interfering with each other. **Daytona's sandboxing solved this by providing secure, controlled execution environments for each agent instance.

  4. Persistent Storage for Negotiations: Storing and retrieving large volumes of negotiation transcripts efficiently. **Tigris integration provided scalable object storage that handles all conversation logs and enables easy replay and analysis.

  5. Monitoring Negotiation Quality: Tracking negotiation outcomes and price feedback in real-time. **Galileo's logging capabilities allowed us to monitor every negotiation, log final prices, and get feedback on deal quality automatically.

  6. SSE Connection Management: Managing connections, reconnections, and message delivery required robust error handling and heartbeat mechanisms.

  7. Database Lock Contention: SQLite on Windows had lock contention. We enabled WAL mode and used short transactions.

  8. Prompt Engineering: Getting agents to negotiate realistically without revealing internal constraints required extensive iteration.

  9. Time Constraints: Building in 6 hours required strict prioritization.

Results & Impact

We delivered a functional prototype that:

  • Orchestrates 1 buyer vs. 10 sellers simultaneously in secure Daytona sandboxes
  • Achieves 94% average coherence score (monitored via Galileo)
  • Maintains <3% hallucination rate (tracked by Galileo)
  • Processes negotiations in under 3 minutes
  • Supports real-time streaming with <2s API response times
  • All negotiations stored in Tigris for replay and analysis
  • Galileo logs all negotiated prices and provides feedback on deal quality

The system demonstrates that LLMs can handle complex multi-agent negotiations without handcrafted scoring, opening possibilities for AI-powered marketplace simulations, academic research, and e-commerce applications. The combination of Daytona's secure execution, **Tigris's persistent storage, and **Galileo's comprehensive monitoring creates a production-ready platform for studying and deploying AI negotiation agents.

Built With

Share this project:

Updates