节点文献

基于两步决策与ε-greedy探索的增强学习频谱分配算法

Double-Step Decision Reinforcement Learning Spectrum Management Using ε-greedy Exploration

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 尹之杰汪一鸣吴澄

【Author】 Yin Zhijie;Wang Yiming;Wu Cheng;School of Railway Transportation,Soochow University;

【机构】 苏州大学轨道交通学院

【摘要】 在认知无线网络中,认知基站需要进行频谱管理来提升非授权用户的服务质量。基站在寻找频谱空洞分配给非授权用户的过程中,需要做出最好的选择,但极可能是局部最优解,从而造成非授权用户频繁的频谱切换和吞吐率的下降。针对此问题,本文提出基于两步决策与探索的集中式增强学习频谱分配算法。通过设计新型状态动作集,认知基站进行信道分配的两步决策,并应用探索模式,解决认知基站在增强学习过程中探索环境和利用经验进行决策的平衡问题,防止决策的局部最优,提升频谱管理的性能。仿真结果表明,该算法在提升非授权用户吞吐率以及降低频谱切换方面明显优于现有的一些频谱分配策略。

【Abstract】 In cognitive radio network environment,the base station needs to carry out an effective spectrum management policy to guarantee the licensed user’s communication and to improve the quality of service of the cognitive radio users at the same time.In the process of allocating spectrum holes to cognitive radio users,the base station faces massive passive channel switching due to the unpredictability of the licensed user and it results in the throughput of cognitive radio users’ degradation.To solve this problem,this paper proposes a novel base station-cognitive base station,which contains reinforcement learning model with novel state and action sets.The cognitive base station can perform two-step decision of channel allocation,that is,whether to switch the channel for cognitive radio users and how to select the best channel if the cognitive base station decides to switch,so as to avoid excessive channel switching and improve the throughput of the cognitive radio user.Also,the performance of reinforcement learning spectrum management policy highly depends on the exploration of environment.In this paper,epsilon-greedy exploration method is used to solve the balance problem of cognitive base station in exploring the unknown environment and exploiting the existing knowledge.Simulation results show that the implementation of the epsilon-greedy in each decision step has a remarkable effect on the system performance.Also,we set up the best evaluation of a combination of two-step epsilon so that the proposed method is superior to traditional reinforcement learning spectrum allocation scheme in improving cognitive radio users’ throughput and reducing channel switching.

【基金】 国家自然科学基金(61471252)资助项目
  • 【文献出处】 数据采集与处理 ,Journal of Data Acquisition and Processing , 编辑部邮箱 ,2018年06期
  • 【分类号】TN925
  • 【被引频次】6
  • 【下载频次】149
节点文献中: 

本文链接的文献网络图示:

本文的引文网络