节点文献

基于深度学习的自然场景文本定位与识别研究

Research on Text Location and Recognition in Natural Images with Deep Learning

【作者】 张平

【导师】 高海昌;

【作者基本信息】 西安电子科技大学 , 计算机软件与理论, 2018, 硕士

【摘要】 随着多媒体以及互联网技术的快速发展,人们获取自然场景图片的方式越来越多,怎样从丰富的图片中摘取自己需要的信息就变得越来越重要。而随着人工智能和深度学习技术的发展,高效读取自然场景中文本的应用需求也急剧增加,自然场景文本定位和识别的应用场景也越来越多,如视频或图像自动索引、智能交通系统、盲人导航系统、机器人导航系统、地理位置信息自动服务、工业生产自动化等等。本文研究的内容是自然场景下的文本定位与识别,最终将要实现一个端到端的系统,用以定位并识别自然场景图像中的文本。本文所识别的文本类型为英文和数字,而包括中文在内的其他文本则不在本文的研究范畴中。自然场景下的文本检测与识别包括文本区域定位和文本识别两个主要步骤。本文对以上两个部分进行了深入的研究,并实现一个端到端的识别框架将定位和识别打通。本文的主要内容如下:(1)将一般目标检测和定位的方法应用在自然场景文本定位上,在自然场景复杂的场景下,提取文本区域。自然场景图像的背景异常复杂,有的图像中包含大量的文本外的杂物,有的文字会和背景高度融合,有的图像中文字可能随意分布,这些都会给定位造成很大的影响。本文的一个研究内容就是找到通用的算法,从复杂的场景下提取出文本区域。为了解决这个问题,本文将原本应用在一般目标检测和定位上的Faster RCNN和Mask RCNN方法加以修改重新训练,使用在自然场景文本定位问题上,在定位准确率和定位时间上都取得了不错的效果,这部分也是本文的创新点。(2)对于自然场景下提取出来复杂多样的文本,寻找通用算法,通过尽可能少的预处理操作,进行识别。自然场景文本中的字符有的粘连很严重,有的字体非常复杂,有的由于遮挡曝光等原因噪音特别多,我们无法找到一种通用的方法进行分割、去噪等预处理,所以本文的研究内容之一就是寻找通用的方法,用尽可能少的预处理操作,对自然场景文本进行有效识别。通用性体现在不对图片进行专用的预处理操作。为了达到这一目标,本文研究了几种神经网络以及几种配合神经网络使用的机制和算法,对自然场景文本只进行简单的灰度化、尺寸归一化等处理之后,设计了基于CNN、RNN和CTC的方法以及基于CNN、RNN和Attention机制的方法,对图像中的文本使用基于整体识别(与切分单个字符识别的方法相对)方法,在与目前开源的Jaderberg等人的方法对比之后,发现我们的识别方法在识别率和识别时间上都表现不错。

【Abstract】 With the rapid development of multimedia and Internet technology,there are more and more ways to get pictures of natural scenes.It is becoming more and more important to pick up the information needed from the abundant pictures.With the development of computer vision technology and deep learning technology,the demand for efficient reading of text in natural scenes has also increased dramatically.There are more and more applications of text localization and recognition in natural scenes.Such as real time translation of pictures and words,automatic indexing of video or images,intelligent transportation system,blind man navigation,robot navigation system,automatic location information service,industrial automation and so on.The content of this thesis is the detection and recognition of the text in the natural scene.Finally,an end to end system will be implemented to locate and recognize the text in the natural scene image.The text types identified in this thesis are English and numeric,while other texts,such as Chinese,are not in the research scope of this article.Text detection and recognition in natural scenes includes two main steps,text area location and text recognition.Based on the research and summary of the excellent algorithm strategies at home and abroad,this thesis studied the two prats deeply and realized an end-to-end system combining the two steps together.The main contents are as follows:(1)Applying the methods of detection and location of general objects to text location and realizing general algorithms to extract text regions from the complex scenes of natural scenes.The background of natural scenes images is very complex and some images may contain many other objects except text.In some images,text regions may merged tightly in the background,and in other images text may be randomly distributed.All of these will have a great impact on the detecting and locating of text regions.One object of this thesis is to find a general algorithm to detect and locate text regions from the complex scenes.In order to solve this problem,this thesis modified and retrained the methods Faster RCNN and Mask RCNN,which were originally applied to general objects location and classification,and has achieved good results in location accuracy and operation time on the natural scenes text regions location problem.This part is also a new point of this thesis.(2)For the complex and diverse texts extracted from natural scenes,researching general methods to recognize them by as few preprocess operations as possible.Some characters in the natural scene text may adhere each other seriously,some characters may be in very complex fonts,and some characters are very hard to recognize for highly exposed or other noises.All of these result in that we cannot find a general method to preprocess the denoising and segmentation.Generality is reflected in the fact that no special preprossing is performed on the picture.Therefore,the other content of this thesis is to find general methods to effectively recognize the text extracted from the natural scenes using as little preprocessing as possible.To reach this point,this thesis designed a method based on CNN,RNN and CTC and a method based on CNN,RNN and Attention mechanism.After processing simple grayscale and dimensional normalization of natural scene text,two kinds of neural networks based on the principle of integral recognition(opposite to recognition of individual character cut from whole text)are designed and realized.Compared with the open source methods proposed by Jaderberg,we found that our recognition methods perform well in accuracy rate and recognition time.

  • 【分类号】TP391.41;TP18
  • 【被引频次】9
  • 【下载频次】535
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络