Researchers at We Katan Labs present Ore routerA brand new routing model and a brand new framework that intelligently assigned to essentially the most suitable large language model (LLM).
For corporations that create products which might be depending on several LLMs, ore routers would really like to resolve a very important challenge: How to direct questions on the most effective model for the job without raining logic or costly retraining each time something changes.
The challenges of the LLM routing
When the variety of LLMs grows, developers move from individual model setups to multimodell systems that use the clear strengths of every model for certain tasks (e.g. codegen, text overview or image processing).
LLM routing has developed as a key technique for creating and providing these systems as a traffic controller, which refers to each user query to essentially the most suitable model.
Existing routing methods generally fall into two categories: “Task-based routing”, by which queries are directed based on predefined tasks and “performance-based routing” and strive for an optimal balance between costs and performance.
Task-based routing fights with unclear or changing user intentions, especially in multi-gymnastics talks. The performance-based routing, alternatively, prioritizes the benchmark rankings strictly, often neglects the user preferences of the actual world and adapts poorly to recent models unless it’s delighted.
Basically, because the researchers of Katanemo Labs determine of their Paper“Existing routing approaches have restrictions on the use in the actual world. They often optimize the benchmark performance and neglect the human preferences which might be determined by subjective evaluation criteria.”
The researchers underline the necessity for routing systems that “agree with the subjective human preferences, offer more transparency and remain easy to adapt if models and applications develop”.
A brand new framework for the preference orientation of routing
In order to handle these restrictions, the researchers suggest a “preference-oriented routing” that corresponds to queries with routing guidelines based on custom settings.
In this context, users define their routing guidelines within the natural language using a “domain motion taxonomy”. This is a two-stage hierarchy that reflects how people describe tasks naturally, starting with a general topic (the domain comparable to “legal” or “finance”) and a certain task (the plot comparable to “summarization” or “code generation”).
Each of those guidelines is then linked to a preferred model in order that developers could make routing decisions based on real needs as a substitute of only on benchmark reviews. As from paper, it says: “This taxonomy serves as a mental model with which users can define clear and structured routing guidelines.”
The routing process takes place in two phases. First, a preference-oriented router model takes on the user query and the entire amount of guidelines and selects essentially the most suitable guideline. Second, an project function connects the chosen guideline with the named LLM.
Since the model selection logic is separated from the rule of thumb, models may be added, removed or replaced by editing the routing guidelines without transferring or changing the router itself. This decoupling offers flexibility that’s required for practical deployments by which models and applications are always evolving.
The collection of guidelines is operated by ore routers, a compact 1.5b parameter language model, which is meant for the preference-oriented routing. The ore router receives the user query and the entire sentence of guideline descriptions throughout the command prompt. It then generates the identification of the most effective matching guidelines.
Since the rules are a part of the input, the system can adapt to recent or modified routes through inference and context-keeper and without retraining. This generative approach enables ore routers to make use of its advanced knowledge with a purpose to understand the semantics of each the query and the rules and to process your entire conversation history at the identical time.
A typical problem with the inclusion of intensive guidelines in an input request is the potential for increased latency. However, the researchers designed that ore routers was highly efficient. “While the length of the routing guidelines may be long, we will easily increase the context window of the ore router with minimal effects on latency,” explains Salman Paracha, co-author of the paper and founder/CEO of Katanemo Labs. He notes that the latency is primarily driven by the length of the output, and for ore routers, the output is solely the short name of a routing guideline comparable to “image_editing” or “Document_Creation”.
Ore router in motion
In order to create ore routers, the researchers have coordinated a 1.5b parameter version of the QWen 2.5 model on a curated data record of 43,000 examples. Then they tested their performance against state-of-the-art proprietary models from Openai, Anthropic and Google on 4 public data sets that were developed to guage conversation AI systems.
The results show that ore routers achieves the best overall routing rating of 93.17%and all other models, including the most effective proprietaries, exceeds a median of seven.71%. The advantage of the model grew with longer conversations and demonstrated its strong ability to pursue the context over several curves.

In practice, in line with Paracha, this approach is already utilized in several scenarios. In open source coding tools, developers, for instance, use ore routers to make different phases of their workflow comparable to “code design”, “code understanding” and “codegen” for the LLMS best for each task. Similarly, corporations can forward document creation requirements to a model comparable to Claude 3.7 Sonnet, while they send image processing tasks to Gemini 2.5 per.
The system can also be ideal for private assistants in numerous domains by which users have quite a lot of tasks from the summarization of text to factoid queries, “said Paracha, adding:” In these cases, ore routers might help developers to mix and improve the overall user experience. “
This framework is integrated in TurnKatanemo Labs' Ai-native proxy server for agents that developers can implement sophisticated traffic classes. When integrating a brand new LLM, a team can, for instance, send a small part of knowledge traffic to the brand new model for a selected routing guideline, check its performance with internal metrics after which completely pass traffic with trust. The company also works to integrate its tools into evaluation platforms with a purpose to further optimize this process for corporate developers.
Ultimately, the goal is to transcend the implementations of Siled -Ki. “The ore router and within the broader sense change developers and firms within the broader sense of fragmented LLM implementations to a uniform, political system,” says Paracha. “In scenarios by which user tasks are diverse, our framework can transform this task and the LLM fragmentation right into a uniform experience, which makes the top product feel seamlessly for the top user.”

