节点文献

基于强化学习的可解释性推荐策略优化

Policy Optimization for Explainable Recommendation Based on Reinforcement Learning

【作者】 张伟

【导师】 林凡;

【作者基本信息】 厦门大学 , 软件工程, 2021, 硕士

【摘要】 强化学习已经被广泛地应用在提升推荐的准确性问题上。在基于强化学习的可解释性推荐算法中,路径搜索策略的质量和路径推理策略的收敛是当前面临的两个重要的问题。针对以上两个问题,本文提出了相应的解决方法:1.针对可解释性推荐模型的路径搜索策略质量问题,本文提出两个方法对其进行优化。首先本文提出一种基于当前状态下的蒙特卡罗增量融合未来各状态下蒙特卡罗增量的方法,对当前状态下的路径搜索策略进行优化,以提升可解释性推荐模型的路径搜索策略质量;其次,本文提出一种基于融合个性化损失函数的约束,以增强强化学习模型的训练,最终得到具有个性化识别能力的路径搜索策略,进而提升可解释性推荐模型的路径搜索策略质量和推荐的准确性。2.针对可解释性推荐模型的策略收敛问题,本文提出了一种基于对动作空间进行优混空间划分的方法。由方法命名可知,强化学习过程中的动作空间将会被划分为优势动作空间和混合动作空间。在智能体训练的过程中,使用优势动作空间训练的策略指导微调基于混合动作空间训练的策略,在保证策略的开发性和探索性达到平衡的同时,也对模型策略的收敛能力进行了优化。本文基于亚马逊购物数据集对以上方法创新进行验证。实验结果表明,本文提出的改进算法均在推荐准确率上获得了一定的提升。

【Abstract】 Reinforcement learning have been widely adopted to improve recommendation accuracy.In the explainable recommendation algorithm based on reinforcement learning,the quality of path search policy and the convergence of path search policy are two important problems.In view of the above two problems,this thesis puts forward the corresponding solutions:1.Aiming at the quality problem of path search policy in explainable recommendation model,this thesis proposes two methods to optimize it.Firstly,this thesis proposes a method based on the current state of the Monte Carlo incremental fusion of the future state of the Monte Carlo incremental,to optimize the current state of the path search policy,in order to improve the quality of the explainable recommendation model path search policy;Secondly,this thesis proposes a path search policy based on the fusion of personalized loss function constraints to train the reinforcement learning model,and finally obtains the path search policy with personalized recognition ability,so as to improve the quality of path search policy and the accuracy of recommendation in explainable recommendation model.2.Aiming at the problem of policy convergence of explainable recommendation model,this thesis proposes a method based on the excellent-hybrid space partition of action space.It can be seen from the naming that the action space is divided into excellent action space and hybrid action space.In the process of agent training,the policy of excellent action space training is used to guide the fine-tuning of the policy based on hybrid action space training.While ensuring the balance between the development and exploration of the policy,the policy convergence of the model is also optimized.This thesis verifies the above methods based on Amazon shopping dataset.The experimental results show that the improved algorithm proposed in this thesis has a certain improvement in the recommendation accuracy.

  • 【网络出版投稿人】 厦门大学
  • 【网络出版年期】2025年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络