节点文献

基于深度学习的目标检测技术的研究

Object Detection Based on Deep Learning

【作者】 刘博

【导师】 董远;

【作者基本信息】 北京邮电大学 , 电子与通信工程, 2016, 硕士

【摘要】 在图像信息快速膨胀的今天,如何快速有效的对静态图像进行标注,从静态图像中检测和定位出目标类物体,是机器学习和计算机视觉领域中最基础大挑战之一。目标检测技术是指从静态图像中检测和定位出一般目标类。这个问题在技术上很难有效实现,主要存在这几方面的原因,一是因为很多目标物体类可以在外观上有很大不同,这些变化不仅仅是由于光照和角度不同引起的,而且还取决于非刚性变形,不如同样是汽车,会有不同的形状变形。近几年来,目标检测技术的性能发展变得很缓慢并已经停滞不前。目前性能最好的目标检测系统都是一些很复杂的整合系统,这些系统结合从目标检测符提取的多种低层图像特征和从场景分类器获得的高层语境。由于这些系统很复杂并且只是基于SIFT或HOG这些手工设计的低层图像特征,所以不能够准确、快速地检测和定位目标类。在本论文中,我们深入分析了深度卷积神经网络在静态图像中目标检测技术研究中的应用。结合候选区域提取,模型微调和特征提取的概念,解决了深度卷积神经网络模型在与分类任务不同的数据集上的训练和优化问题,提出模型微调的方法,设计了三种不同深度,不同规模大小的卷积神经网络,先训练预训练模型,然后再进行模型微调,最后使用微调后的深度模型进行目标检测。本文中的目标检测算法能够准确检测图像中的一般目标类,可以准确地定位出一般目标类,这也间接证明了深度模型具有比较强的泛化能力。在目标检测过程中,将引入一些图像切割算法,如selective-search算法,应用于前期针对图像切割出很多图像子区域,在本文中称之为候选区域,这些候选区域中可能存在着需要检测的目标类。此外,这些识别出的候选区域会通过一个训练好的区域回归器,得到更接近真实物体所在的区域。我们针对深度模型的内部特征不透明,网络过于抽象,不利于研究人员对深度模型进行训练和优化的问题,本文设计了一种类似反卷积网络,将高层特征重构到RGB颜色空间,实现对深度卷积神经网络的可视化技术。我们从中了解到不同层所学习到的特征各有不同。所以提升深度模型性能的关键就是,如何有效的分析和利用深度卷积神经网络所提取的特征,分析出所需要优化的地方,然后再对深度卷积神经网络进行优化。我们基于以上针对基于深度学习的目标检测技术的研究,报名参与了2015年的Imagenet Large Scale Visual Recognition Challenge 的竞赛,在ILSVRC2014 Object Detection任务的数据集上,实现了单一模型在detection任务测试集上取得mean AP指标42.3%的优秀成绩。目前结果还在提交过程中。

【Abstract】 Along with the rapid expansion of image and video information,how to quickly and effectively analysis and annotate the static image,detect and localize the object class from a static image,is one of the most basic challenge of machine learning and computer vision.The technology of object detection is difficult to effectively implement,because of two main reasons.Firstly,many of objects can be very different in appearance.Secondly,the difference is not only changes in illumination,but also the non-rigid deformations.Object detection is a task that detects and localizes the custom objects.Object detection performance has plateaued in the last few years.The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context.These systems are very complex and based considerably on the use of low-level image features,such as SIFT and HOG,so they can not efficiently and fast detect and localize objects.Deep learning has achieved great performance in computer vision,and deep model has been applied to a lot of tasks,such as hand-written recognition,image classification,object detection.This paper analysis the application of deep convolutional neural network on the technology research of object detection in static image.We aim to optimize a large convolutional neural network model on a small scale dataset,with the help of a large auxiliary dataset.Deep architecture performs well on large dataset,but it may occur over fitting on a small scale dataset.So transferring a big deep model trained on a large scale dataset to a new small scale dataset is meaningful.We proposed an algorithm to implement the transfer,called fine-tuning.Fine-tune a deep convolutional neural network model on a large scale dataset by replacing the classifiers of the convolutional neural network model of a new one and remaining the feature extractor.The stacked convolutional layers and pooling layers are treated as feature extractor,and the fully connected layers and the softmax layers are treated as a classifier.In addition,we compared the performance of features extracted by the convolutional neural network with the handcraft features,such as HOG and SIFT.Also we compare the performance of our deep model with the SVM classifier.In the object detection process,we will introduce some image segmentation algorithms,such as selective-search and edge-boxes.These algorithms will be applied to generate various category-independent region proposals.These region proposals define the set of candidate detections available to our detector.Large convolutional network models have recently demonstrated impressive classification performance on the ImageNet benchmark.However there is no clear understanding of why they perform so well,or how they might be improved.In this paper we introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier.We also perform an ablation study to discover the performance contribution from different model layers.Based on those algorithms above,we participate the 2015 Imagenet Large Scale Visual Recognition Challenge.On the detection validation datasets,we achieve a mean average precision(mAP)of 42.3%with a single model compare other methods of models fusion.

  • 【分类号】TP391.41;TP18
  • 【被引频次】4
  • 【下载频次】383
节点文献中: 

本文链接的文献网络图示:

本文的引文网络