LLM Router

Inspiration Managing multiple LLM providers is a nightmare. Each has different pricing, rate limits, model capabilities, and quotas. We wanted a single API endpoint that intelligently routes requests to the best available provider—automatically handling failures, optimizing for cost or speed, and even falling back to a local Ollama instance when cloud quotas run out. What it does LLM Router is a quota-aware routing system that provides a unified OpenAI-compatible API across 13+ LLM providers. It intelligently distributes requests based on strategy (cost-optimized, quality-first, latency-first, or auto), handles automatic failover when providers go down, and includes two-tier caching (exact + semantic) for speed and cost savings. As a last resort, it falls back to a local Ollama instance. How we built it Built with Python using uv for dependency management. The core components:

Router: Implements multiple routing strategies with circuit breakers
Quota Manager: Token-bucket rate limiting per provider
Discovery: Hourly dynamic model capability fetching
Cache: DiskCache for exact matches, cosine similarity for semantic
Server: FastAPI-based OpenAI-compatible API on port 7544 Challenges we ran into
Normalizing model capabilities across wildly different provider APIs
Building resilient failover logic that doesn't get stuck
Implementing semantic caching without sacrificing speed
Handling rate limits gracefully across 13+ providers with different quotas Accomplishments that we're proud of
Unified API that works with any OpenAI-compatible client
Automatic provider failover with zero downtime
Semantic caching that actually reduces costs on repeated queries
Local Ollama fallback ensures availability even when all cloud quotas are exhausted What we learned
Provider APIs vary enormously in their model metadata and capabilities
Circuit breakers are essential for resilient distributed systems
Two-tier caching dramatically improves both latency and cost efficiency What's next for LLM Router
Add more routing strategies (e.g., context-length aware, vision-capable routing)
Implement user-specific quotas and billing
Expand discovery to more providers
Build a dashboard for monitoring and analytics

Built With

built
dependency
for
python
using
uv
with

Updates

remix onwin started this project — Feb 20, 2026 12:53 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.