节点文献

面向城市道路场景的视觉语义同时定位与建图算法研究

Research on Visual Semantic Simultaneous Localization and Mapping Algorithm for Urban Road Scene

【作者】 赵海龙

【导师】 刘涛;

【作者基本信息】 哈尔滨工业大学 , 车辆工程, 2020, 硕士

【摘要】 同时定位与建图是实现智能驾驶所需要解决的最基本的问题,它能够为路径规划提供位置和周围环境信息。目前大多数同时定位与建图算法都是基于环境是静态的强假设,视觉里程计很容易受运动物体干扰造成位姿漂移甚至丢失,建立的地图也会出现残影和变形。近些年,随着深度学习的高速发展,其实用性也越来越强,开始广泛地应用在很多领域,包括视觉同时定位与建图技术领域。本文以视觉传感器双目相机作为信息源,结合深度学习进行面向城市道路环境的研究,降低运动物体对位姿估计的影响并建立语义八叉树地图。本文对ORB-SLAM2进行改造,添加语义分割线程以及稠密语义建图线程,并在跟踪线程中加入运动检测模块。本文算法一共有五个主线程,包括跟踪线程、语义分割线程、稠密语义建图线程、局部建图线程以及回环检测线程。本文首先处理训练数据集,简化KITTI语义分割数据集训练数据的分类,接着将处理后的数据集作为ENet深度神经网络的输入训练ENet网络模型。网络模型收敛后,结束训练并将其加入语义分割线程中对主线程获取到的图像进行语义分割,得到语义标签图以及语义彩图。另外,根据先验知识将语义标签值分为动态属性物体标签值、静态属性物体标签值和无属性物体标签值三类。接着在跟踪线程中根据特征点所对应的语义标签值对ORB特征点进行分类,使用筛选出的静态属性特征点计算位姿。得到位姿后,计算动态属性特征点的重投影误差,根据误差大小决定是否保留动态属性点。然后利用静态属性特征点以及保留的动态属性特征点对位姿进行优化。在稠密语义建图部分首先利用SGM算法计算出双目视差图,接着利用一致性检测和中值滤波优化视差图。根据优化后的视差图、相机内参计算点的三维坐标,根据坐标信息与语义信息进行直通滤波建立初始语义点云地图。由于初始语义点云地图包含了错误信息以及大量冗余信息,所以对其进行了体素滤波以及统计滤波,得到了优化后的点云。接着将语义点云地图转化为语义八叉树地图进行表示并实时更新。最后在KITTI数据集上进行实验,实验结果表明本文算法提高了原算法在存在大量动态物体、大尺度的城市道路场景中的准确性以及鲁棒性,并建立无运动物体、带有语义信息的语义八叉树地图。

【Abstract】 Simultaneous localization and mapping are the most basic problems that need to be solved for intelligent driving.It can provide location and surrounding environment information for path planning.At present,most of the simultaneous localization and mapping algorithms are based on the strong assumption that the environment is static.Visual odometry is easily disturbed by moving objects,causing pose drift or even loss.The built map will also appear afterimages and deformation.In recent years,with the rapid development of deep learning,its practicality has also become stronger and stronger,and it has been widely used in many fields,including the field of visual simultaneous localization and mapping.In this paper,the visual sensor binocular camera is used as the information source,combined with deep learning to study the urban road environment,reduce the impact of moving objects on pose estimation and establish a semantic octree map.This paper transforms ORB-SLAM2,adds semantic segmentation thread and dense semantic mapping thread,and adds motion detection module to the tracking thread.There are five main threads in this algorithm,including tracking thread,semantic segmentation thread,dense semantic mapping thread,local mapping thread and loop closing thread.In this paper,the training data set is first processed to simplify the training data classification of the KITTI semantic segmentation data set,and then the processed data set is used as the input of the ENet deep neural network to train the ENet network model.After the network model converges,it ends the training and adds it to the semantic segmentation thread to perform semantic segmentation on the image obtained by the main thread to obtain a semantic label image and a semantic color image.In addition,according to a priori knowledge,semantic label values are divided into three categories: dynamic attribute object label values,static attribute object label values and non-attribute object label values.Then,in the tracking thread,the ORB feature points are classified according to the semantic label values corresponding to the feature points,and the postures are calculated using the selected static attribute feature points.After the pose is obtained,the reprojection error of the dynamic attribute feature points is calculated,and whether to retain the dynamic attribute points is determined according to the magnitude of the error.Then use the static attribute feature points and the retained dynamic attribute feature points to optimize the pose.In the dense semantic mapping thread,first use the SGM algorithm to calculate the binocular disparity map,and then use consistency detection and median filtering to optimize the disparity map.Based on the optimized disparity map and the internal parameters of the camera,the three-dimensional coordinates of the point are calculated,and the initial semantic point cloud map is established by through filtering based on the coordinate information and semantic information.Since the initial semantic point cloud map contains error information and a lot of redundant information,it is subjected to voxel filtering and statistical filtering to obtain an optimized point cloud.Then convert the semantic point cloud map into a semantic octree map for presentation and update in real time.Finally,experiments were performed on the KITTI data set.Experimental results show that the proposed algorithm improves the accuracy and robustness of the original algorithm in the presence of many dynamic objects and large-scale urban road scenes,and establishes a semantic octree map with no moving objects and semantic information.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络