reinforcement learning algorithm meaning in English

强化式学习算法
强化学习算法

Examples

In this paper , introducing joint - action to the traditional reinforcement learning , a new multi - agent reinforcement learning algorithm based on behavior prediction is presented and several methods for predicting other agents " behaviors are discussed
在传统强化学习方式中引入组合动作的基础上，本文提出了一种基于行为预测的多智能体强化学习方法，研究了对其他智能体行为进行预测的几种可行方法。
The reinforcement learning algorithm was also introduced , since it has some relations with the colony algorithm and can be need in the problem of scheduling . 4 . some new concepts and scheduling algorithms for batch chemical process were proposed in our studies
由于蚁群算法与人工智能中的强化学习算法之间有着某种联系，同时强化学习近年来也应用于求解调度问题，因此本文也涉及到了一些强化学习的主要算法。
Reinforcement learning algorithms that use cerebellar model articulation controller ( cmac ) are studied to estimate the optimal value function of markov decision processes ( mdps ) with continuous states and discrete actions . the state discretization for mdps using sarsa - learning algorithms based on cmac networks and direct gradient rules is analyzed . two new coding methods for cmac neural networks are proposed so that the learning efficiency of cmac - based direct gradient learning algorithms can be improved
在求解离散行为空间markov决策过程( mdp )最优策略的增强学习算法研究方面，研究了小脑模型关节控制器( cmac )在mdp行为值函数逼近中的应用，分析了基于cmac的直接梯度算法对mdp状态空间离散化的特点，研究了两种改进的cmac编码结构，即：非邻接重叠编码和变尺度编码，以提高直接梯度学习算法的收敛速度和泛化性能。
By means of the proposed reinforcement learning algorithm and modified genetic algorithm , neural network controller whose weights are optimized could generate time series small perturbation signals to convert chaotic oscillations of chaotic systems into desired regular ones . the computer simulations on controlling henon map and logistic chaotic system have demonstrated the capacity of the presented strategy by suppressing lower periodic orbits such as period - 1 and period - 2 . meanwhile , the periodic control methodology is utilized , the higher periods such as period - 4 can also be successfully directed to expected periodic orbits
该控制方法无需了解系统的动态特性和精确的数学模型,也不需监督学习所要求的训练数据,通过增强学习训练方式,采用改进遗传算法优化神经网络权系数,使之成为混沌控制器,便可产生控制混沌系统的时间序列小扰动信号,仿真实验结果表明它不仅能有效镇定混沌周期1 、 2等低周期轨道,而且在周期控制技术基础上,也可成功将高周期混沌轨道(如周期4轨道)变成期望周期行为。
L3ased on the organization rules of internet data , the distribution laws of hyperlinks and the name rules of url , a algorithm of tvm rebuilding is established , and satisfactory experiment results are obtained by applying this algorithm . furthermore , efforts are made by applying of tvm on browse navigation , web page classification and reinforcement learning algorithm
结合互联网资源的构建规则、链接分布规律和url命名规则，论文提出了树藤共生数据模型的重建算法，实验结果验证了树藤共生模型的有效性与合理性，在此基础上初步讨论了树藤共生模型在浏览导航、网页分类和reinforcementlearning算法中的应用。

More: Prev

reinforcement learning algorithm meaning in English

Examples

Related Words