节点文献
多智能体系统协作模型及其在足球机器人仿真系统中的应用
Cooperation Models for the Multi-agent System and Application to the RoboCup Soccer Simulator
【作者】 彭军;
【导师】 吴敏;
【作者基本信息】 中南大学 , 控制理论与控制工程, 2005, 博士
【摘要】 多智能体系统(MAS)的协作是近年来分布式人工智能领域的研究热点。机器人足球世界杯(RoboCup)仿真比赛是检验各种MAS理论的标准平台,在这个极为复杂的多智能体环境中,多智能体必须通过协作才能完成它们的共同目标:多进球赢得比赛。本文在RoboCup仿真球队的设计中,应用规划、学习和预测技术建立了MAS的协作策略和模型,主要研究工作有: 针对RoboCup足球机器人仿真系统面临的主要协作问题,提出一种双层的MAS协作模型框架,它包含协作策略和动作决策两层,这种结构不仅可以增强整个系统的智能度,而且还有利于多智能体间的动态实时协作。 采用基于状态的规划协作模型,实现了协作智能体对系统的快速实时反应,不仅提高了单个智能体的反应速度,而且还提高了整个MAS协作的效率。从RoboCup仿真比赛进攻的角度提出了一种基于合作意愿矩阵的传球规划协作策略,实现了一种不依赖于通信的显式多智能体协作。 用阵形将多个智能体联接成一个拥有共同目标的团队,并引入角色实现事先给定的任务分配和站位配合,从防守的角度出发,实现了基于阵形变换的多智能体动态防守协作,将案例学习应用到阵形设计中,突破了单凭直接经验设置阵形的局限。可实现积极防守阵形和消极防守阵形两个阵形之间动态变换,满足了不同阶段的防守协作要求,显著提高了球队的整体防守性能。 提出了一种基于亲密度模型进行动态防守协作的方法。在采用人盯人战术时,智能体通过亲密度的计算确定是否需要协助队友完成人盯人任务;在采用区域防守战术时,由阵形确定每个区域的主要责任人,由亲密度确定每个区域的次要责任人,来共同完成区域的防守任务。通过亲密度模型的应用,实现了智能体间更好的协作,解决了盯人失败导致漏人的问题和区域防守中边界无人盯防的问题,达到了分工和协作的统一。 从提高仿真球队的整体协作能力出发,提出了一种基于行为的预测方法,使RoboCup的协作模型设计简单、反应速度快、适应性好和智能度较高。采用基于行为预测的协作模型在CSU_YunLu队中实现了球队的协作决策,如传球和下底传中等小局部配合。
【Abstract】 Over the last few years, a multi-agent system (MAS) has been the subject of controversy in the field of distributed artificial intelligence. The RoboCup soccer simulation league was established and has been used as a standard platform to test the proposed various MAS theories. Robot soccer contains an extremely complicated environment. In such surroundings, agents have to cooperate to achieve the objective of making goals as much as possible and winning the match.This dissertation contributes to the design of the RoboCup simulation team. Cooperation tactics and models of MAS have been built by integrating the technologies of planning, studying and prediction synthetically. The main achievements are as follows.To solve the cooperation problem of a RoboCup soccer simulation team, a dual MAS cooperative-model architecture, which is composed of a cooperation strategy and an action decision, is proposed. It reinforces the intelligence of the whole system as well as the dynamic real-time cooperation among the agents.To obtain a quick real-time response of the MAS, a planning and cooperation model is established based on the state of the system. The proposed model not only increases the reaction speed of the individual agents, but also improves the cooperation efficiency of the MAS. From attacking point of view in the soccer game, a ball-passing cooperation strategy is proposed based on a cooperation-desire matrix. It enables the agents in the MAS to have an obvious cooperation, which is independent of communication.Agents are linked into a group to achieve a common goal through formation. The allocation and standing-location cooperation for a pre-assigned task are carried out by introducing the concept of role. From the defend point of view in the soccer game, these strategies are performed to achieve dynamic defensive cooperation based on the formation switching. In order to break through the limitation of the human experience-based formation setup, the case-based learning is applied to the formation design and implemented the dynamic formation switching between active and passive defense. Therefore, the defenserequirement at different stages is satisfied, and the total defense performance of the team is notably improved.A dynamic defensive cooperation based on an affined-degree model is proposed. When the marking strategy is adopted, an agent calculates its corresponding affined degree and confirms whether or not to assist a teammate to accomplish a man-to-man marking defence task. When an area-based defensive strategy is adopted, the main leader in the area is selected by formation and the vice-leader is chosen based on the affined degree. They complete the defense task together. Adoption of the affined -degree model yields efficient cooperation. Thus, it solves the problems of omitting of marked-object and lack of player at the sides of the area caused by marking failure, and achieves a good combination between the division and cooperation.To improve the global cooperation capability of the simulation team, a behaviour-based predicting method is proposed. This method simplifies the design of cooperation model, and increases the response speed, flexibility and intelligence of agents. The implementation of the behaviour predicting based cooperation model in CSU_YunLu team shows that the cooperation decision, such as passing a ball and base line passing are carried out successfully.A statistic based Q-learning algorithm for multi agents is proposed by combining the statistic learning and Q-learning. An agent learns action policies of other agents through perceiving the joint actions. The employment of total probability of policies distribution matrix ensures that the learning agent chooses an optimal action, and guarantees the convergence of the algorithm theoretically. The algorithm reduces the multi agents leaning space from a conventional exponential space to a polynomial space, and improves the learning efficiency greatly. This algorithm has been successfully applied to the off-line training of the cooperation policy in RoboCup.The main differences between reinforced learning and other learning methods are delayed reward and trial-and-error. These two characteristics may cause a temporal credit assignment problem and infinite state visiting problem. In a MAS, these may result in slow convergent, or evenworse, not convergent in the learning. To solve such problems, a predicting based Q-learning model for multi agents is proposed. It has a structure of two levels, and integrates planning, predicting and Q-learning. The efficient on-line learning capability of the proposed model is demonstrated in RoboCup.The validity of the proposed MAS cooperation strategies and models has been demonstrated by the CSU_YunLu simulation team.
【Key words】 multi-agent system (MAS); RoboCup; cooperation model; learning; planing;