节点文献
基于多智能体强化学习的无人机群协同搜索研究
Research on Collaborative Search of UAV Group Based on Multi-agent Reinforcement Learning
【作者】 刘凯;
【导师】 彭倍;
【作者基本信息】 电子科技大学 , 机械工程, 2022, 硕士
【摘要】 集群搜索和区域覆盖问题在军事侦察、蜂群打击、智慧植保、环境勘探等方面有着广泛的应用,在军用和民用领域被广泛研究。现有区域搜索和覆盖算法所规划的路径具有很强的规律性,没有充分考虑集群的动态变化和环境态势的发展,因而无法获得多约束条件和存在突发事件下的最优解。本文以无人机群搜索问题为研究课题,重点研究基于深度强化学习的无人机群搜索算法。主要研究内容有以下方面。梳理了从马尔可夫过程到多智能体强化学习值分解问题的发展历程,指出现有基于强化学习的集群搜索算法研究多以单智能体演化算法为主,缺乏基于值分解问题的最新方法。因而都不可应用于分布式的决策环境中。为了平滑地从单智能体算法向多智能体算法过渡,本文首先提出了基于序列决策的集群搜索方案。在回报函数设计、环境探索设计方面进行了充分的思考,巧妙利用状态空间在时间域的歧义性问题实现不同智能体间的样本互用,打破回报函数设计与任务目标呈正相关或负相关的思维定势,并通过仿真对比验证了改进方案的收敛稳定性及效果更优。在基于序列决策的集群搜索方案的基础上,本文进一步提出了基于分布式决策的无人机群搜索方案。详细阐述了两者在环境建模方面的区别。从算法层面通过图文结合的方式详细论证了现有非单调性值分解算法学习速度慢的原因,并提出了快速值分解算法。通过仿真试验证实了该集群搜索问题属于非单调性值分解问题,且新提出的算法比现有非单调性值分解算法有着更快的学习速度。最后本文设计了一个基于机器人操作系统和无线路由通信的无人机群实物验证平台,基于该系统给出了无人机群协同搜索的软件在环验证及实物飞行验证,进一步证实了基于序列决策的无人机群搜索方案的可行性及算法优越性。
【Abstract】 The problem of swarm search and area coverage has a wide range of applications in military reconnaissance,swarm strike,intelligent plant protection,environmental exploration,etc.,and has been widely studied in military and civilian fields.The paths planned by the existing area search and coverage algorithms do not fully take the dynamic changes of the swarm and the development of the environmental situation in to consideration,thus the optimal solution under multiple constraints and emergencies cannot be obtained.This paper takes the unmanned aerial vehicle(UAV)swarm search problem as the research topic,and focuses on the UAV swarm search algorithm based on deep reinforcement learning.The main research contents are as follows.The development process from Markov process to the value decomposition problem in multi-agent deep reinforcement learning problem is combed,and it is pointed out that the existing reinforcement learning based algorithms on swarm search are mainly evolution of single-agent methods,lacking the latest methods based on value decomposition problem.Therefore,neither of the existing algorithms can be used in a distributed decision-making environment.A swarm search scheme based on sequential decision-making is firstly proposed in this paper in order to smoothly transition from a single agent system to a multi-agent system.It has fully thought about the design of reward function and environmental exploration,and cleverly used the ambiguity of state space in the time domain to achieve sample interoperability between different agents,and broke the mindset of that the reward design is positively or negatively correlated with the task.The convergence stability of the improved scheme is better,which is verified by the simulation comparison.On the basis of the swarm search scheme based on sequence decision,this paper further proposes a swarm search scheme based on distributed decision,and elaborates the difference between the above two in environmental modeling.The disadvantage and reasons of the slow learning speed of the existing non-monotonic value decomposition algorithm are demonstrated in detail through the combination of graphics and text from the algorithm level,and a fast value decomposition algorithm is proposed.The simulation test proves that the swarm search environment design belongs to the non-monotonic value decomposition problem,and the new algorithm has a faster learning speed than the original algorithm.Finally,this paper designs a multi-UAV swarm physical verification platform based on the robot operating system and wireless routing communication.Based on this system,the software-in-the-loop verification and physical flight verification of the UAV swarm cooperative search are given,which further confirms that the swarm search scheme based on sequence decision has priority and robust than the existing optimization algorithm.
【Key words】 Unmanned Aerial Vehicle Swarm; Deep Reinforcement Learning; Multi-agent System; Swarm Search; Area-coverage Path Planning;