延迟强化 meaning in Chinese
delayed reinforcement
Examples
- We propose a novel aco algorithm , ant ( ) algorithm , based on the eligibility trace , the algorithm unifies the td method and mc method mathematically , and can make the delayed reinforcement can be back propagated in time . several novel aco algorithms are presented for flowshop scheduling problem
本文还在蚁群算法中引入强化学习的资格迹理论并提出了一个新颖的基于资格迹的蚁群优化算法ant ( ) ,该算法实现了蒙特卡洛方法与瞬时差分方法的数学意义上结合,并能使蚂蚁获得的延迟强化信号及时地在其旅行路径上反向传播。