节点文献
基于强化学习的多交叉路口交通信号控制方法研究
Traffic Signal Control Methods at Multi-Intersection Based on Reinforcement Learning
【作者】 王敏;
【导师】 吴黎兵;
【作者基本信息】 武汉大学 , 计算机系统结构, 2022, 博士
【摘要】 随着经济的快速发展,城市车辆不断增多,城市交通压力越来越大,这严重影响了人们的日常出行,同时也提高了交通事故的发生率。由此带来的其他问题同样也比较严重,包括环境污染、经济损失和人们整体生活质量的下降。由于缺乏改善基础设施的空间和资源,因此如何在现有的基础设施内改善交通流量和交通信号控制(Traffic Signal Control,TSC)成了一个亟待解决的问题。传统交通信号最常见的是通过固定时间、驱动或自适应控制方法进行控制,尽管这些方法对交通状况反应灵敏,但并不能完全应对动态变化的交通需求。特别是在交通流量高度饱和的情况下,不是最佳选择。由于交通现象具有高度动态性和复杂性,因此,通常采用强化学习(Reinforcement Learning,RL)来处理这一类自适应控制的问题。本文研究城市交通路网中的多交叉路口信号灯控制问题。首先传统的交通信号控制方法针对的是单个交叉路口,但通常情况下相连的路口交通流是互相影响的。本文基于强化学习提出了一种协同控制方法,该方法可以考虑相连多个路口的交通流带来的影响。其次,针对交通状态会出现对称和旋转的特性,设计与结构无关的交通需求建模方案。在训练过程中强化学习模型在与环境交互的时候,通常会存在探索能力弱、收敛性差等问题。本文提出了基于演示学习的强化学习算法,通过演示数据对强化学习模型进行预训练,达到快速收敛和提高性能的目的。然后,针对时间特征和空间特征更好地融合问题,本文提出用于交通信号控制的时空图注意力网络,充分挖掘多交叉路口潜在的时空关系。最后,本文采用元学习,设计了元时空图注意力网络,用来处理多交叉路口动态变化的交通流,以此来提高交通信号控制的效率,节省车辆的等待时间。具体研究工作和贡献总结如下:(1)基于区域协同策略的多交叉路口交通信号控制方法在城市道路场景下,通常会有连续的交叉路口相连。这种情况下,如果一个交叉路口发生交通拥堵,那么其他的交叉路口也可能会受影响。通常的强化学习方法针对单个交叉路口设置一个代理来控制,往往不能做到对多个路口进行协同控制。还有一类方法是设置单个模型,所有交叉路口的样本都利用这个模型训练,该方法的缺陷是对模型规模和泛化能力要求更高。这些方法都不能做到对多个交叉路口的协同控制,本质上相连交叉路口的交通流影响并没有考虑进去。本文针对基于单代理的交通信号控制方法不能适应多交叉路口的问题,并在基于策略的强化学习算法基础上提出一种多交叉路口交通信号协同控制方法RACS。该方法将邻居路口的状态作为自身状态的一部分,同时考虑邻居路口交通信号控制策略带来的影响。实验结果表明,在多交叉路口的场景,RACS相比于现有的方法IA2C,在合成数据集和真实数据集上,分别减少了等待时间48.9%和31.0%。(2)与路口结构无关的多交叉路口交通信号控制方法路口的交通状态往往具有对称或旋转特性,如果模型能识别这种特性,那么可以减少重复训练。基于此,本文设计了一种与路口结构无关的交通需求建模方案。此外,强化学习在训练初期与环境进行交互的时候学习能力比较弱,而调节学习率有可能会使得训练变成过拟合或者欠拟合的状态。如何解决这个问题是几乎所有强化学习算法需要面对的。本文提出了基于演示学习的方法Ape-X DQf D,该方法首先利用传统的自组织信号灯控制(Self-Organizing Traffic Light,SOTL)方法训练模型,然后收集SOTL的训练数据。把这些训练数据作为演示数据对强化学习模型进行预训练,预训练完以后的模型具有了较强适应环境的能力,在正式训练的时候能快速达到收敛的状态,缩短训练时间,同时也能提升模型的性能。三个城市数据集上的实验证实,Ape-X DQf D方法比现有方法收敛速度更快,行程时间更短,平均减少23.9%、23.8%、11.6%。(3)基于时空特征融合的多交叉路口交通信号控制方法图神经网络不同于卷积神经网络和循环神经网络,它更适合用于图结构或者类似图结构的问题中,以及处理一些非欧式空间生成的数据。现有的基于图神经网络的交通信号控制方法并没有考虑过去一段时间内的交通流状态,而实际的交通流是连续的,前一段时间的状态对后续信号灯的切换造成了直接影响,所以这些时间维度的特征应该被考虑。另一方面,基于时空特征的模型,有的方法仅考虑一种特征,有的直接把时间特征和空间特征进行联合,没有充分挖掘它们内在的关联性。本文提出了用于多交叉路口的动态时空图注意力网络框架Dyn STGAT,它采用时间卷积网络(Temporal Convolutional Network,TCN)和图注意力网络(Graph Attention Network,GAT)获取过去一段时间内的时空特征,用LSTM和GAT获取当前时刻的时空特征,然后将这两种特征进行融合。最后,采用DQN网络来预测下一时刻交通信号的状态。实验结果表明,在合成数据集上行程时间比Co Light方案少13.8%。而真实数据集上,行程时间比Co Light分别减少了7.2%和3.6%。(4)基于元学习的动态多交叉路口交通信号控制方法图神经网络通常用在具有图结构的任务中,但是通常情况下这些图中节点的属性是固定的。然而现实情况下,更多的是节点属性不断地在变化,比如社交网络,如何处理这种动态变化的特征是问题的关键所在。在交通信号控制的问题中,如果把一个交叉路口看作一个节点,相连的交叉路口看作它的邻居节点,很显然节点的属性也是不断变化的,因为车流是动态的。针对这种场景,本文提出一个基于元学习的时空图注意力网络Meta STGAT,元知识学习模块能够根据变化的节点特征动态学习节点间的权重,该权重的更新优化了图注意力网络,从而能使整个模型获取更好的效果。实验结果表明,基于元学习的时空图注意力网络性能要比单独的时空图注意力网络STGAT性能要好。在四个合成数据集和两个真实世界数据集上,Meta STGAT比图网络方法Co Light分别减少了12.23%、19.30%、13.84%、10.91%、8.24%和8.74%的行程时间。综上,本文从简单相连的多交叉路口场景,到路口结构异构的场景,再到动态变化且复杂的多交叉路口场景,对交通信号控制问题进行了多角度探讨和研究。本文分别提出了四种基于强化学习的多交叉路口交通信号控制方法,有效提高了交通效率,最大化了路口的车流量,减少了车辆的等待时间。
【Abstract】 With the booming of economic development,the number of urban vehicles is increasing,and urban traffic is becoming more and more stressful,seriously affecting people’s daily travel and increasing the incidence of traffic accidents.Other problems are equally serious,including environmental pollution,economic losses and a decline in overall quality of life.The lack of space and resources to optimize the infrastructure makes it a pressing issue to improve traffic flow and Traffic Signal Control(TSC)within the existing infrastructure.Traditional traffic signal methods are most commonly controlled by fixed-time,driven,or adaptive control methods,which are responsive to traffic conditions but do not fully cope with fluctuating traffic scenarios.In particular,it is not the best choice when the traffic flow is highly saturated.Due to traffic flow’s highly dynamic and complex characteristics,Reinforcement Learning(RL)is commonly employed to deal with these adaptive problems.This thesis investigates the problem of signal control in urban traffic road networks.Firstly,traditional traffic signal methods target individual intersections,but traffic flows at connected intersections are usually affected by each other.This thesis proposes a cooperative control method based on reinforcement learning,which can incorporate the impact of traffic flow from multiple adjacent intersections.Second,although reinforcement learning solves sequential decision-making tasks well,it models usually suffer from weak exploration ability and poor convergence when interacting with the environment during the training process.This thesis proposes imitation or demonstration learning to pre-train the reinforcement learning model by demonstration data for fast convergence and improved performance.Then,to better integrate spatial and temporal features,this thesis proposes a dynamic spatial-temporal graph attention network for traffic signal control to fully exploit the potential spatial-temporal joint relations.Finally,based on meta-learning,this thesis proposes a meta spatial-temporal graph attention network for handling dynamic changing traffic flows to improve traffic signals’ efficiency and save vehicles’ waiting time.The specific research works and contributions are summarized as follows:(1)Traffic Signal Control With Reinforcement Learning Based on Region-Aware Cooperative StrategyIn urban road scenarios,there are usually consecutive intersections connected,and in this case,if traffic congestion occurs at one intersection,then other intersections may also be affected.Therefore,the classic reinforcement learning method usually allows an agent to control a single intersection,which often does not allow cooperative control of multiple intersections.Another approach is to set up a single model and train all intersection samples with this model,which has the disadvantage of requiring a larger model size and generalization ability.None of these methods can achieve cooperative control of multiple intersections,and essentially the traffic flow effects of the connected intersections are not considered.This thesis addresses the problem that single agentbased traffic signal control methods cannot adapt to multiple intersections and proposes a cooperative control method for multiple intersection signals(RACS)on a policybased reinforcement learning algorithm.The method takes the state of neighboring intersections as part of its state while considering the influence of neighboring signal control policies.At the multi-intersection scenarios,RACS reduces the waiting time by 48.9% and 31.0% on the synthetic and real datasets,respectively,compared to the existing method IA2 C.(2)Intersection Structure Independent Traffic Signal Control Method at MultiintersectionTraffic states at intersections often have symmetric or rotational characteristics.If the model can recognize such characteristics,then repeated training can be reduced,based on which a traffic demand modeling scheme independent of the intersection structure is designed in this thesis.In addition,reinforcement learning is weak at the early stage of training when interacting with the environment,and adjusting the learning rate may make the training overfitting or under-fitting.How to solve this problem is what almost all reinforcement learning algorithms need to face.The thesis proposes a demonstration learning-based approach,Ape-X DQf D,which first trains a model using a traditional Self-Organizing Traffic Light(SOTL)method and then collects training data from SOTL.These training data are used as demonstration data to pre-train the reinforcement learning model,which has a strong ability to adapt to the environment after pre-training so that it can quickly reach the convergence state during formal training and shorten the training time.It can also improve the model’s performance and control the traffic signals well.Experiments on three urban datasets confirm that the method proposed in the thesis performs better than the mainstream RL-based method,with faster convergence and the least travel time,averaging 23.9%,23.8%,and 11.6%.(3)Multi-intersection Traffic Signal Control Method Based on Spatial-temporal Feature FusionGraph neural networks differ from convolutional and recurrent neural networks,which are more suitable for tasks with graph structures or graph-like structures and some non-Euclidean space-generated data.Some of the existing graph neural network based signal control methods do not consider the traffic flow state in the past period.In contrast,the actual traffic flow is continuous,and the previous period’s state directly impacts the switching of subsequent signals,so these time-dimensional features should be considered.On the other hand,some models based on spatial-temporal features directly combine temporal features and spatial features without fully exploiting their intrinsic correlation.This thesis proposes a Dynamic Spatial-Temporal Graph Attention Network(Dyn STGAT)for traffic signal control.It uses Temporal Convolutional Network(TCN)and Graph Attention Network(GAT)to obtain the spatial-temporal features in the past time,LSTM and GAT are fused to obtain the spatial-temporal features of the current moment.A DQN network is used to predict the state of the signal light at the next time step.The experimental results show that the travel time is 13.8% less than that of the Co Light method on the synthetic dataset(with configuration 2).Moreover,on the real dataset,the travel time is 7.2% and 3.6% less than Co Light,respectively.(4)A Meta-learning Based Method for Dynamic Multi-intersection Traffic Signal ControlGraph neural networks are often used in tasks with graph structures,but the properties of the nodes in these graphs are usually fixed.However,in reality,more often than not,the node properties are constantly changing,such as social networks.How to deal with such dynamically changing scenarios is a critical issue.In traffic signal control,if an intersection is considered a node and the connected intersections as its neighboring nodes,it is obvious that the properties of these nodes are constantly changing because the traffic flow is dynamic.This thesis proposes Meta-learning Based Spatial-Temporal Graph Attention Network named Meta STGAT to address this situation.The meta-knowledge learning module in Meta STGAT can dynamically learn the weights among nodes based on the changing node features.Updating this weight optimizes the graph attention network so that the whole model can obtain better results.The meta-knowledge learning module consists of a two-layer fully-connected network,and the experimental results show that the performance of Meta STGAT is better than that of the Spatial-Temporal Graph Attention Network(STGAT)alone.On four synthetic and two real-world datasets,Meta STGAT reduces the travel time by 12.23%,19.30%,13.84%,10.91%,8.24%,and 8.74%,respectively,over the graph network method Co Light.In summary,this thesis explores the traffic signal control problem from a simple connected multi-intersection scenario to a scenario with heterogeneous intersection structure to a dynamically changing and complex multi-intersection scenario.This thesis proposes four reinforcement learning-based multi-intersection signal control methods,which effectively improve traffic efficiency,increase the traffic flow at the intersection,shorten the vehicle travel time,and reduce the vehicle waiting time.
【Key words】 Traffic Signal Control; Reinforcement Learning; Graph Attention Network; Meta-learning; Traffic Efficiency;
- 【网络出版投稿人】 武汉大学 【网络出版年期】2024年 03期
- 【分类号】TP273;U491.54