Reinforcement learning algorithms that use cerebellar model articulation controller ( cmac ) are studied to estimate the optimal value function of markov decision processes ( mdps ) with continuous states and discrete actions . the state discretization for mdps using sarsa - learning algorithms based on cmac networks and direct gradient rules is analyzed . two new coding methods for cmac neural networks are proposed so that the learning efficiency of cmac - based direct gradient learning algorithms can be improved 在求解离散行为空间markov决策过程( mdp )最优策略的增强学习算法研究方面,研究了小脑模型关节控制器( cmac )在mdp行为值函数逼近中的应用,分析了基于cmac的直接梯度算法对mdp状态空间离散化的特点,研究了两种改进的cmac编码结构,即:非邻接重叠编码和变尺度编码,以提高直接梯度学习算法的收敛速度和泛化性能。