It must have been 2025 the yr of the “AI agents”, According to Nvidia CEO Jensen Huang and others within the AI industry. And in some ways this has been the case, with quite a few AI model leaders reminiscent of OpenAI, Google and even Chinese competitors reminiscent of Alibaba releasing fine-tuned AI models or applications that give attention to a limited variety of tasks, reminiscent of web searching and report writing.
However, a serious hurdle to a way forward for high-performance, reliable AI agents stays: getting them to remain on task when the duty spans multiple steps. Third-party benchmark testing show that even essentially the most powerful AI models have higher failure rates the more steps they take to finish a task and the longer they spend doing it (greater than hours).
A latest academic framework called EAGLET proposes a practical and efficient method to enhance long-term task performance of LLM-based agents – without the necessity for manual data labeling or retraining.
Developed by researchers at Tsinghua University, Peking University, DeepLang AI and the University of Illinois Urbana-Champaign. EAGLET provides a “global planner” that will be integrated into existing agent workflows to cut back hallucinations and improve task efficiency.
EAGLET is a fine-grained language model that interprets task instructions – typically provided as prompts by the user or the agent's operating environment – and generates a high-level plan for the agent (supported by its own LLM). It doesn’t intervene during execution, but its pre-execution helps reduce planning errors and improve task completion rates.
Solving the scheduling problem for agents with long horizons
Many LLM-based agents struggle with tasks over prolonged periods of time because they depend on reactive, step-by-step pondering. This approach often results in trial-and-error behavior, planning hallucinations, and inefficient trajectories.
EAGLET addresses this limitation by introducing a Global planning module which works along with the Executor agent.
Instead of merging planning and motion generation right into a single model, EAGLET separates them, enabling more coherent task-level strategies.
A two-stage training pipeline without human annotations
EAGLET's planner is trained using a two-step process that doesn’t require human-written plans or annotations.
In the primary phase, synthetic plans are created using high-performance LLMs reminiscent of GPT-5 and DeepSeek-V3.1-Think.
These plans are then filtered using a novel strategy called homologous consensus filtering, which retains only those who improve task performance for each experienced and novice execution agents.
In the second phase, a rule-based reinforcement learning process further refines the planner by assessing the extent to which each plan helps multiple agents succeed using a tailored reward function.
Introducing the Executor Capability Gain Reward (ECGR)
One of EAGLET's most significant innovations is the Executor Capability Gain Reward (ECGR).
This reward measures the worth of a generated plan by checking whether it helps each high- and low-performing agents complete tasks more successfully and with fewer steps.
It also features a decay factor to encourage shorter, more efficient task progressions. This approach avoids over-rewarding plans which might be only useful to already competent agents and promotes more general planning guidance.
Compatible with existing agents and models
The EAGLET scheduler is designed to be modular and plug-and-play, meaning it may well be inserted into existing agent pipelines without requiring retraining of the executor.
In evaluations, the scheduler increased performance on quite a lot of basic models, including GPT-4.1, GPT-5, Llama-3.1, and Qwen2.5.
It also proved effective no matter prompting strategy, working well with standard ReAct-style prompts in addition to approaches reminiscent of reflection.
Highest performance on all benchmarks
EAGLET was tested on three widely used benchmarks for long-horizon agent tasks: ScienceWorld, which simulates scientific experiments in a text-based laboratory environment; ALFWorld, which tasks agents with completing household activities using natural language in a simulated home environment; and WebShop, which evaluates targeted behavior in a practical online shopping interface.
In all three cases, the EAGLET-equipped executor agents outperformed their non-planning counterparts and other planning baselines, including MPO and KnowAgent.
In experiments with the open source model Llama-3.1-8B-Instruct, EAGLET increased average performance from 39.5 to 59.4, a rise of +19.9 points across all tasks.
In ScienceWorld scenarios not shown, it increased performance from 42.2 to 61.6.
In the scenarios observed by ALFWorld, EAGLET improved results from 22.9 to 54.3, representing a greater than 2.3x increase in performance.
Even stronger increases were recorded for more powerful models.
For example, GPT-4.1 with EAGLET improved from 75.5 to 82.2 average rating and GPT-5 increased from 84.5 to 88.1 despite already performing strongly.
In some benchmarks, the performance increases were as much as +11.8 points, for instance when combining EAGLET with the ETO executor method on invisible ALFWorld tasks.
Compared to other planning frameworks reminiscent of MPO, EAGLET consistently delivered higher task completion rates. For example, in ALFWorld's unseen tasks with GPT-4.1, MPO scored 79.1, while EAGLET scored 83.6 – a lead of +4.5 points.
Additionally, the paper reports that agents using EAGLET complete tasks in fewer steps on average. With GPT-4.1 because the executor, the typical step count dropped from 13.0 (no planner) to 11.1 (EAGLET). With GPT-5 it dropped from 11.4 to 9.4, supporting the claim of improved execution efficiency.
Efficiency gains in training and execution
Compared to RL-based methods reminiscent of GiGPO, which may require a whole bunch of coaching iterations, EAGLET achieved higher or comparable results with about one-eighth the training effort.
This efficiency also carries over to execution: agents using EAGLET typically required fewer steps to finish tasks. This results in a discount in inference time and computational cost in production scenarios.
No public law yet
At the time of the version submitted to arXiv, the authors haven’t published an open source implementation of EAGLET. It is unclear if and when the code will likely be released, under what license, or how it can be maintained, which could limit the framework's near-term usefulness for enterprise use.
VentureBeat has reached out to the authors to make clear these points and can update this text once we hear back.
Questions still remain about enterprise deployment
Although the scheduler is described as plug-and-play, it stays unclear whether EAGLET can easily integrate with popular enterprise agent frameworks reminiscent of LangChain or AutoGen, or whether a custom stack is required to support the separation of plan and execution.
Similarly, the training setup leverages multiple executor agents, which could also be difficult to duplicate in enterprise environments with limited model access. VentureBeat asked the researchers whether the homologous consensus filtering method may very well be adapted for teams that only have access to an executor model or limited computing resources.
The authors of EAGLET report success with all model types and sizes, but it surely isn’t yet known what the minimum feasible model scale is for practical use. For example, can enterprise teams effectively use the planner with open models under 10 billion parameters in latency-sensitive environments? Additionally, the framework may provide industry-specific value in areas reminiscent of customer support or IT automation. However, it stays to be seen how easily the planner will be fine-tuned or customized for such industries.
Real-time vs. pre-generated planning
Another open query is how best to make use of EAGLET in practice. Should the scheduler work in real time together with the executors inside a loop, or is it higher used offline to pre-generate global plans for known task types? Each approach has implications for latency, cost, and operational complexity. VentureBeat asked the authors this query and can report on any findings.
Strategic compromises for corporate teams
For medium to large enterprise technical leaders, EAGLET represents a compelling proof of concept for improving the reliability and efficiency of LLM agents. But without public tools or implementation guidelines, the framework still represents a construct versus wait decision. Organizations must balance the potential gains in task completion and efficiency against the prices of reproducing or aligning the Consider the training process in your individual company.
Possible use cases in corporate environments
For firms developing agentic AI systems – especially in environments that require step-by-step planning, reminiscent of: B. IT automation, customer support or online interactions – EAGLET offers a template for integrating planning without retraining. Its ability to guide each open and closed source models, together with its efficient training method, could make it a horny place to begin for teams seeking to improve agent performance with minimal overhead.

