OASIS: Open AI Selection & Integration System
Inspiration
I was building a side project that needed multiple AI capabilities—document summarization, code generation, and creative writing. I started with OpenAI's API, but problems emerged immediately: Cost became unsustainable when using GPT-4 for every request, even simple ones. Rate limits killed productivity during peak hours. Vendor lock-in created risk—what happens when pricing changes or services go down? I tried manually switching between providers (OpenAI, Anthropic, local models), but the codebase became unmaintainable. Each provider had different API formats, inconsistent error handling, and zero intelligence about which model suited which task. I was spending more time managing AI infrastructure than building features. The gap became clear: no simple tool existed to route AI requests intelligently across providers based on task requirements, cost, and availability. Every developer was solving this problem alone, poorly, with brittle custom code. What it does OASIS is a lightweight routing layer that sits between your application and multiple AI providers. You make one call to OASIS, and it handles the complexity: Intelligent Request Analysis: Examines task type, token count, and latency requirements to understand what you actually need. Optimal Provider Selection: Uses a scoring algorithm that balances quality, cost, and speed: ( score = \frac{quality}{cost \times latency} ). A simple explanation doesn't need GPT-4 when Gemini Flash delivers the same result at 1/10th the cost. Automatic Failover: If a provider is down or rate-limited, OASIS reroutes to the next best option without your application knowing anything went wrong. Response Normalization: Regardless of which provider handled the request, you get consistent, predictable output in a unified format. Example usage: pythonfrom oasis import Router
router = Router() response = router.complete( prompt="Explain quantum computing in simple terms", task_type="explanation", max_cost=0.01 )
Behind the scenes, OASIS might route this to Gemini Flash (cheap, fast) instead of GPT-4 Turbo (expensive) because explanatory text doesn't require the most powerful model. Your application stays simple while OASIS handles the orchestration.
## How we built it
**Architecture:**
We designed OASIS as a three-layer system:
React Frontend (Vite) → FastAPI Backend → Router Engine → AI Providers ↓ ↓ Redis Cache OpenRouter ↓ Gemini PostgreSQL DB Groq Perplexity Frontend Stack:
React with Vite for fast builds and hot reload Tailwind CSS for utility-first styling Framer Motion for smooth animations shadcn/ui (Radix primitives) for accessible components
Backend Stack:
Python 3.11 with FastAPI for async request handling SQLAlchemy ORM with Alembic migrations for schema management PostgreSQL for production, SQLite for local development Redis for caching frequently requested prompts
AI Integration Strategy: Rather than building individual integrations for dozens of models, we use OpenRouter as our primary gateway (access to 100+ models through one API) with direct integrations for providers where native access offers advantages:
OpenRouter: Primary gateway for model diversity Google Gemini: Native integration via google-generativeai for Gemini-specific features Groq: Direct integration for high-speed inference when latency matters Perplexity: Native integration for research-oriented queries
Router Logic: The core routing engine scores each provider for incoming requests: pythondef score_provider(provider, request): quality = model_benchmarks[provider.model]['quality'] cost = estimate_cost(provider, request.tokens) latency = provider.avg_latency_ms
return (quality * request.quality_weight) / (cost * latency)
Weights are configurable per request, allowing users to optimize for cost (batch processing), speed (real-time chat), or quality (critical analysis). Deployment:
Containerized with Docker for consistent environments Backend deployed on cloud infrastructure Frontend on Vercel for edge optimization
Challenges we ran into
Challenge 1: Response Format Hell Each AI provider returns data in completely different structures. OpenRouter uses choices[0].message.content, Google Gemini uses candidates[0].content.parts[0].text, Groq has its own format. Handling errors was even worse—some return HTTP 429, others 503, some embed errors in JSON responses. Solution: Built a comprehensive response normalizer that maps every provider format to a unified schema: python{ "text": "response content", "model": "gemini-1.5-flash", "tokens_used": 342, "cost_usd": 0.0012, "latency_ms": 890, "provider": "gemini" } This required reverse-engineering each provider's API behavior, including edge cases like streaming responses and multi-turn conversations. Challenge 2: Real-time Cost Calculation Accurate cost calculation required knowing token counts before making requests (for provider selection) and after completion (for billing). But tokenization differs across models—GPT uses tiktoken, Gemini has its own tokenizer, Llama models use SentencePiece. Solution: Implemented a two-phase system:
Pre-request estimation using tiktoken (good enough for most models) Post-request validation using provider-specific token counts when available Historical cost tracking in Redis to improve future estimates
This approach gave us 95%+ accuracy in cost predictions while keeping routing decisions fast. Challenge 3: Failover Without Waste When OpenRouter goes down mid-request, we can't just retry—that wastes money and time. We needed intelligent failover that understood provider health and avoided cascading failures. Solution: Implemented a circuit breaker pattern with exponential backoff: pythonif provider.consecutive_failures > threshold: provider.status = "degraded" exclude_from_routing(provider, timeout=300) # 5 min cooldown notify_monitoring_system(provider) The system tracks failure rates per provider and automatically excludes unhealthy ones from routing decisions. Providers re-enter rotation gradually after cooldown periods. Challenge 4: Database Schema Evolution As we added features during the hackathon, database schema changes became frequent. Manual migrations were error-prone and blocked development. Solution: Used Alembic for automatic migration generation and version control. When schema changed, we could confidently migrate production data without downtime: bashalembic revision --autogenerate -m "add provider metrics" alembic upgrade head
Accomplishments that we're proud of
Sub-500ms routing decisions despite analyzing requests, scoring multiple providers, and checking cache. We achieved this through aggressive optimization: scoring happens in parallel, Redis lookups are batched, and the hot path avoids database calls entirely. 40% cost reduction in testing through intelligent caching. Repeated prompts (common in development workflows) hit Redis cache with 0.08s latency instead of 1.2s API calls. The cost savings compound quickly at scale. Production-ready API design from day one. We didn't build a demo—we built something deployable. Proper error handling, rate limiting, authentication, request validation, and structured logging. FastAPI's automatic OpenAPI documentation means the API is self-documenting. Seamless provider switching that users never notice. When OASIS fails over from one provider to another, response quality remains consistent. We spent significant time ensuring model outputs were comparable across providers for similar capabilities. Modern, responsive UI built with React and shadcn components. The interface is clean, fast, and accessible. Framer Motion animations provide feedback without feeling gimmicky. Actually finished in 72 hours. Not a prototype, not a proof-of-concept—a working system with frontend, backend, database, caching, multiple provider integrations, error handling, and deployment configuration.
What we learned
Simplicity ships, complexity stalls. I wasted 8 hours building an ML-based routing system that used historical data to predict optimal models. It was elegant but slow and over-engineered. Switched to a simple weighted scoring algorithm and shipped it in 2 hours. It works better. Caching transforms economics. Adding Redis wasn't just about speed—it fundamentally changed cost structure. Repeated queries (testing, common use cases) went from dollars to cents. The 40% savings in our testing understates real-world impact. Developer experience is the product. We spent the final 12 hours on documentation, error messages, example code, and setup scripts. A tool nobody understands is worthless regardless of technical sophistication. Clear docs and helpful errors matter more than additional features. The open-source ecosystem desperately needs this. Every conversation at CloudFest confirmed the same problem: developers want to use multiple AI providers but get trapped by one vendor due to integration complexity. OASIS solves a real, widespread problem. FastAPI is production-ready. We bet on FastAPI for async handling and automatic validation. It delivered. The auto-generated API docs, dependency injection, and request validation saved hours of debugging. Integration testing catches real issues. Unit tests passed but integration tests revealed timing issues, race conditions in concurrent requests, and edge cases in provider failover. Building comprehensive integration tests early saved us from shipping broken failover logic.
What's next for OASIS
Short-term improvements: Streaming support for real-time responses. Currently OASIS waits for complete responses. Users want to see tokens as they generate, especially for chat interfaces. Web dashboard for monitoring costs, usage patterns, and provider performance. Developers need visibility into where money goes and which providers perform best for their workloads. Additional provider adapters. The community wants Cohere, Hugging Face Inference API, Replicate, and Together AI. OpenRouter covers many models, but native integrations offer advantages. Rate limit coordination. Smart backoff that shares rate limit information across instances to prevent thundering herd problems when limits reset. Long-term vision: Auto-optimization from usage patterns. Learn which providers work best for each user's specific workload and automatically adjust routing weights. A developer doing mostly code generation should get different routing than someone doing creative writing. Multi-modal routing. Extend beyond text to intelligently route image generation (DALL-E vs Stable Diffusion vs Midjourney), audio processing, and video generation based on quality requirements and cost constraints. Cost forecasting and budgets. Predict monthly spend based on usage trends and allow users to set hard budget limits with automatic throttling or fallback to cheaper models when approaching limits. Collaborative filtering for provider selection. Use aggregated data from all OASIS users to improve routing decisions. If similar prompts consistently perform better with Gemini than GPT-4, the system should learn that pattern. Most importantly: stay simple. OASIS should be boring infrastructure that just works. The goal isn't to be the most feature-rich or technically impressive tool—it's to be the reliable layer that developers forget about because it never causes problems. The best infrastructure is invisible.Claude is AI and can make mistakes. Please double-check responses. Sonnet 4.5
Built With
- alembic
- docker
- fastapi
- framer-motion
- google-gemini
- groq
- javascript
- openrouter
- perplexity
- postgresql
- python
- radix-ui
- react
- redis
- sqlalchemy
- sqlite
- tailwind-css
- vercel
- vite
Log in or sign up for Devpost to join the conversation.