Inspiration Managing multiple LLM providers is a nightmare. Each has different pricing, rate limits, model capabilities, and quotas. We wanted a single API endpoint that intelligently routes requests to the best available provider—automatically handling failures, optimizing for cost or speed, and even falling back to a local Ollama instance when cloud quotas run out. What it does LLM Router is a quota-aware routing system that provides a unified OpenAI-compatible API across 13+ LLM providers. It intelligently distributes requests based on strategy (cost-optimized, quality-first, latency-first, or auto), handles automatic failover when providers go down, and includes two-tier caching (exact + semantic) for speed and cost savings. As a last resort, it falls back to a local Ollama instance. How we built it Built with Python using uv for dependency management. The core components:
- Router: Implements multiple routing strategies with circuit breakers
- Quota Manager: Token-bucket rate limiting per provider
- Discovery: Hourly dynamic model capability fetching
- Cache: DiskCache for exact matches, cosine similarity for semantic
- Server: FastAPI-based OpenAI-compatible API on port 7544 Challenges we ran into
- Normalizing model capabilities across wildly different provider APIs
- Building resilient failover logic that doesn't get stuck
- Implementing semantic caching without sacrificing speed
- Handling rate limits gracefully across 13+ providers with different quotas Accomplishments that we're proud of
- Unified API that works with any OpenAI-compatible client
- Automatic provider failover with zero downtime
- Semantic caching that actually reduces costs on repeated queries
- Local Ollama fallback ensures availability even when all cloud quotas are exhausted What we learned
- Provider APIs vary enormously in their model metadata and capabilities
- Circuit breakers are essential for resilient distributed systems
- Two-tier caching dramatically improves both latency and cost efficiency What's next for LLM Router
- Add more routing strategies (e.g., context-length aware, vision-capable routing)
- Implement user-specific quotas and billing
- Expand discovery to more providers
- Build a dashboard for monitoring and analytics
Built With
- built
- dependency
- for
- python
- using
- uv
- with
Log in or sign up for Devpost to join the conversation.