节点文献

基于线性平均的强化学习函数估计算法

Reinforcement learning function approximation algorithm based on linear average

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 TAO Jun-yuan1,SUN Jin-wei1,LI De-sheng2 (1.School of Electrical Engineering and Automation,Harbin Institute of Technology,Harbin 150001,China;2.School of Mechanical Engineering & Applied Electronic Technology,Beijing University of Technology,Beijing 100022,China)

【机构】哈尔滨工业大学电气工程及自动化学院；北京工业大学机械工程与应用电子技术学院；

【摘要】提出了一种基于最小线性平均的强化学习算法,用于解决连续空间下强化学习函数估计的非收敛性问题。该算法基于梯度下降法,根据压缩映射原理,通过采用线性平均法作为值函数估计的性能衡量标准,把值函数估计的迭代过程转化为一个收敛于不动点的过程。该算法利用强化学习算法的标准问-题Mountain Car问题进行了验证,仿真结果验证了算法是有效的和可行的,并且可以快速收敛到稳定值。更多还原

【Abstract】 A reinforcement learning algorithm based on linear average is proposed,which is used to solve non-convergent problems of reinforcement learning function approximation in continuous state space.According to contraction theory,this algorithm is based on gradient descent method,which adopts linear average as performance evaluation of value function.So the iterative process of value function becomes a convergent process to a fixed value.A standard reinforcement learning problem,Mountain Car Problem,is used to verify the performance of the algorithm.Results show the effectiveness,feasibility and quick convergence of the algorithm.更多还原

【关键词】自动控制技术；强化学习；线性平均；函数估计；梯度下降法；
【Key words】 automatic control technology； reinforcement learning； linear averages； function approximation； gradient descent method；

【基金】 “863”国家高技术研究发展计划项目(2003AA404140)

【文献出处】吉林大学学报(工学版) ,Journal of Jilin University(Engineering and Technology Edition) , 编辑部邮箱 ,2008年06期

【分类号】TP301.6
【被引频次】3
【下载频次】174

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献