Fields from robotics to medicine to political science try to coach AI systems to make all types of meaningful decisions. For example, using an AI system to intelligently control traffic in a congested city could help drivers reach their destination faster while improving safety or sustainability.
Unfortunately, teaching an AI system to make good decisions is just not a straightforward task.
Reinforcement learning models that underlie these AI decision-making systems still often fail when even small deviations occur within the tasks they’re trained to perform. In the case of traffic, it could be difficult for a model to manage a series of intersections with different speed limits, lane numbers, or traffic patterns.
To increase the reliability of reinforcement learning models for complex tasks with variability, MIT researchers have introduced a more efficient algorithm for training them.
The algorithm strategically selects one of the best tasks to coach an AI agent in order that it might effectively perform all tasks in a set of related tasks. In the case of traffic light control, each task could possibly be an intersection in a task space that features all intersections in town.
By specializing in a smaller variety of intersections that contribute probably the most to the general effectiveness of the algorithm, this method maximizes performance while keeping training costs low.
The researchers found that their technique was between five and 50 times more efficient than standard approaches on a variety of simulated tasks. This gain in efficiency helps the algorithm learn a greater solution faster, which ultimately improves the performance of the AI ​​agent.
“We were capable of see incredible performance improvements by considering outside the box with a quite simple algorithm. An algorithm that is just not very complicated has a greater likelihood of being adopted by the community since it is easier to implement and easier for others to grasp,” says lead creator Cathy Wu, Thomas D associate professor of profession development. and Virginia W. Cabot in Civil and Environmental Engineering (CEE) and the Institute for Data, Systems and Society (IDSS), and a member of the Information and Decision Systems Laboratory (LIDS).
She's there Paper by lead creator Jung-Hoon Cho, a CEE graduate; Vindula Jayawardana, a doctoral student within the Department of Electrical Engineering and Computer Science (EECS); and Sirui Li, an IDSS doctoral student. The research will likely be presented on the Conference on Neural Information Processing Systems.
Find a middle ground
To train an algorithm to manage traffic lights at many intersections in a city, an engineer would typically choose from two essential approaches. It can train an algorithm for every intersection independently, using only that intersection's data, or train a bigger algorithm through the use of data from all intersections after which applying it to every one.
But each approach has its drawbacks. Training a separate algorithm for every task (e.g. a selected intersection) is a time-consuming process that requires an infinite amount of information and calculations, while training one algorithm for all tasks often ends in subpar performance.
Wu and her colleagues sought a compromise between these two approaches.
For their method, they select a subset of tasks and independently train an algorithm for every task. Importantly, they strategically select individual tasks which can be almost certainly to enhance the algorithm's overall performance across all tasks.
They use a typical reinforcement learning trick called zero-shot transfer learning, by which an already trained model is applied to a brand new task without further training. During transfer learning, the model often performs remarkably well on the brand new neighboring task.
“We know that it might be ideal to coach all tasks, but we wondered if we could get away with training a subset of those tasks, apply the result to all tasks, and still see a rise in performance,” says Wu.
To determine which tasks they need to select to maximise expected performance, the researchers developed an algorithm called Model-Based Transfer Learning (MBTL).
The MBTL algorithm consists of two parts. First, it models how well each algorithm would perform if trained independently for a task. It then models how much each algorithm's performance would degrade if it were transferred to a distinct task. This concept is known as generalization performance.
By explicitly modeling generalization performance, MBTL can estimate the worth of coaching on a brand new task.
MBTL does this sequentially, first choosing the duty that ends in the most important performance gain, after which choosing additional tasks that subsequently produce the most important marginal improvements in overall performance.
Because MBTL focuses only on probably the most promising tasks, it might significantly improve the efficiency of the training process.
Reduction in training costs
When researchers tested this method on simulated tasks, including controlling traffic signals, managing real-time speed advisories, and performing several classic control tasks, it was five to 50 times more efficient than other methods.
This means they may arrive at the identical solution by training with far less data. For example, with a 50x increase in efficiency, the MBTL algorithm could train just two tasks and achieve the identical performance as an ordinary method that uses data from 100 tasks.
“From the angle of the 2 essential approaches, which means that data from the opposite 98 tasks weren’t required, or that training all 100 tasks confuses the algorithm and finally ends up performing worse than ours,” says Wu.
With MBTL, even a small amount of additional training time could lead to significantly higher performance.
In the longer term, the researchers plan to design MBTL algorithms that may be prolonged to more complex problems resembling high-dimensional task spaces. They are also excited about applying their approach to real-world problems, particularly next-generation mobility systems.
The research is funded partly by a CAREER Award from the National Science Foundation, the Kwanjeong Educational Foundation PhD Scholarship Program, and an Amazon Robotics PhD Fellowship.