节点文献
基于深度学习的视频目标检测研究
Video Object Detection Based on Deep Learning
【作者】 刘荣;
【导师】 余卫宇;
【作者基本信息】 华南理工大学 , 电路与系统, 2019, 硕士
【摘要】 目标检测是计算机视觉的一个研究分支,旨在让计算机具备人的视觉能力。目标检测最早起源于图像领域,其任务不仅要检测出图像中包含哪些目标类别,还需要定位出目标的位置。传统的目标检测方法主要使用滑动窗口、规则块来提取目标候选区,并结合HOG特征和SVM分类器来对目标进行检测。随着深度学习技术的迅速发展,更多的研究人员使用其相关技术来进行目标检测,提出了以Faster-RCNN、YOLO为代表的经典算法,并在图像处理、语义分割、模式识别等领域取得了巨大的成就,奠定了深度学习在目标检测领域的统治地位。但是随着图像目标检测的准确率逐年提升,后续发展已经进入瓶颈期,越来越多的研究人员将目标检测从图像领域转移到视频领域。视频中包含的图像数量较多,同时图像之间存在大量的像素冗余,如何利用图像的上下文信息来提高目标检测的速度和精度,成为了各国研究人员工作的重点。本文主要围绕基于深度学习的视频目标检测方法展开研究。首先,对神经网络、深度学习等相关技术进行详细阐述,并重点介绍了目标检测算法R-FCN基本原理。其次,本文深入研究了Zhu等人提出的光流法进行目标检测的原理,并在其基础上提出了三点改进:1、融合时间上下文信息,在提取当前图像特征的时候,将其与前后图像帧的特征进行融合,使融合特征具备时间上下文信息。2、使用连续非极大值抑制算法来删除RPN网络提取到的目标候选区,利用前后图像帧的目标候选区之间的联系,提高候选区的选取质量。3、对特征提取网络进行结构的优化,使网络对输入数据具备归一化的效果。最后给出了样例分析改进前后算法的检测结果,在ImagenetVID数据集上的实验结果表明本文提出的改进方法在常用的质量评价标准上的得分高于原有的光流法和传统的图像目标检测算法,同时取得了和当前主流方法相当的效果。
【Abstract】 Target detection is a research branch of computer vision,aiming to make computers have human visual ability.Target detection originated in the image field and its task is not only to detect the target categories included in the image,but also to find the location of target.The traditional target detection method mainly uses sliding window and rule block to extract the target candidate region,combining the HOG feature and the SVM classifier to detect the target.With the rapid development of deep learning technology,more and more researchers use its related technologies for target detection and propose some classical algorithms represented by Faster-RCNN and YOLO.Deep learning technology has made great achievements in the fields of image processing,semantic segmentation and pattern recognition and laid the dominant position in the field of target detection.However,as the accuracy of image target detection increases year by year,subsequent development has slowly entered the bottleneck period.More and more researchers transfer the target detection from the image field to the video field.Usually,the video contains a large number of images and there is a large amount of pixel redundancy between them.How to use the context information of the images to improve the speed and accuracy of target detection has become the focus of researchers in various countries.This paper focuses on the research of video object detection methods based on deep learning.Firstly,the related technologies such as neural network and deep learning are elaborated and the basic principle of target detection algorithm R-FCN is emphatically introduced.Secondly,this thesis deeply studies the principle of object detection by optical flow method proposed by Zhu et al and proposes three improvements on the basis of it: 1.Fuse time context information.When extracting the current image feature,the features of the before and after image frames are fused to make current image features have time context information.2.The continuous non-maximum suppression algorithm is used to delete the target candidate region extracted by the RPN network,so that relationship between the target candidate regions of the image frames before and after is used to improve the selection quality of the candidate region.3.Optimize the structure of the feature extraction network to make the network have a normalizing effect on the input data.Finally,the experimental results of the improved algorithm are given.The experimental results on the ImagenetVID dataset show that the improved method proposed in this paper has a higher score on the common quality evaluation criteria than the original optical flow method and the traditional image target detection algorithm.At the same time,our algorithm has achieved the same results as the current mainstream method.
【Key words】 Target detection; deep learning; fusion feature; continuous non-maximum value suppression; network optimization;