In my experience with LLMs, it is always hard to find the right, efficient LLM for the agent, system, or domain-aware ones. We always need to check LLMs based on the business to figure out which one is good. In one of my projects, a chat bot for about 200k users, it got hurtful. We just spent about 10k in 10 days using GPT 4o! It was expensive. We just routed simple questions like "hello", "what is the time" and ... to GPT 4o. So, this came to me, is it possible to route simple prompts to simpler models like GPT 4o mini? Then I implemented a rule-based routing system. After that, it got better, and eventually, in my master's program, I talked to my supervisor, and he told me it is also a good topic for research. I started a literature review, and after that, I registered for the highline hackathon to implement an MVP for it. It is challenging to find the best model for a given prompt online with a low-latency system. Matching the prompt to the llms can be very challenging. We need to understand the prompt before generation, be aware of different models, and connect these two domains. It is like a recommendation system. Also, it must be fast. Under 100ms, the model should be chosen. So, it was very challenging for me to implement the core routing system, a dashboard for it and the backend side.

Built With

Share this project:

Updates