As companies like OpenAI, Google, and Anthropic expand their API offerings, users face a challenging landscape of model choices, each with unique performance, cost, and latency. Selecting the optimal model for a specific task requires navigating frequent updates, emerging models, and complex APIs.
Public benchmarks like GSM8K for math and MMLU for reasoning only give partial insights for a specific use case and don't cover all models. This pushes users to consult technical papers and experiment with multiple APIs—a time-consuming, costly process.
Multi-LLM routing systems have started to simplify model selection by considering prompt, price, and latency. Recent work like RouterBench (by Martian) has set a benchmark for these systems, evaluating 400,000 inferences to measure routing efficiency across models. However, Martian’s platform requires users to perform manual testing, requiring significant technical effort.
Our approach advances this by incorporating latency and performance as dynamic selection criteria, creating an adaptive system that autonomously optimizes model selection based on relevant benchmarks, user preferences for cost, speed, and task-specific performance. Additionally, we extend this routing framework to handle multimodal inputs—such as video, image, and text—allowing seamless cross-modal analysis. This approach will be enabled by our construction of a new dataset of 40 tasks and user preference pairs.
Built With
- openrouter
- python
Log in or sign up for Devpost to join the conversation.