Inspiration Managing multiple LLM providers is a nightmare. Each has different pricing, rate limits, model capabilities, and quotas. We wanted a single API endpoint that intelligently routes requests to the best available provider—automatically handling failures, optimizing for cost or speed, and even falling back to a local Ollama instance when cloud quotas run out. What it does LLM Router is a quota-aware routing system that provides a unified OpenAI-compatible API across 13+ LLM providers. It intelligently distributes requests based on strategy (cost-optimized, quality-first, latency-first, or auto), handles automatic failover when providers go down, and includes two-tier caching (exact + semantic) for speed and cost savings. As a last resort, it falls back to a local Ollama instance. How we built it Built with Python using uv for dependency management. The core components:

  • Router: Implements multiple routing strategies with circuit breakers
  • Quota Manager: Token-bucket rate limiting per provider
  • Discovery: Hourly dynamic model capability fetching
  • Cache: DiskCache for exact matches, cosine similarity for semantic
  • Server: FastAPI-based OpenAI-compatible API on port 7544 Challenges we ran into
  • Normalizing model capabilities across wildly different provider APIs
  • Building resilient failover logic that doesn't get stuck
  • Implementing semantic caching without sacrificing speed
  • Handling rate limits gracefully across 13+ providers with different quotas Accomplishments that we're proud of
  • Unified API that works with any OpenAI-compatible client
  • Automatic provider failover with zero downtime
  • Semantic caching that actually reduces costs on repeated queries
  • Local Ollama fallback ensures availability even when all cloud quotas are exhausted What we learned
  • Provider APIs vary enormously in their model metadata and capabilities
  • Circuit breakers are essential for resilient distributed systems
  • Two-tier caching dramatically improves both latency and cost efficiency What's next for LLM Router
  • Add more routing strategies (e.g., context-length aware, vision-capable routing)
  • Implement user-specific quotas and billing
  • Expand discovery to more providers
  • Build a dashboard for monitoring and analytics

Built With

  • built
  • dependency
  • for
  • python
  • using
  • uv
  • with
Share this project:

Updates