节点文献

基于深度学习的自然场景文本检测与识别算法研究

Research on Nature Scene Text Detection and Recognition Algorithms Based on Deep Learning

【作者】 夏勇

【导师】 田春娜;

【作者基本信息】 西安电子科技大学 , 信号与信息处理, 2017, 硕士

【摘要】 文字,作为人类智慧的结晶,是人类文明最重要的标志之一。从古至今,文字在我们的生活中都发挥着不可或缺的作用。文字包含丰富而精确的语义信息在基于视觉理解的任务中应用广泛,因此自然场景文本检测与识别变得越来越重要,并且成为计算机视觉和文档分析中一个研究热点。近年来,该领域取得了大量的研究成果和巨大研究进展,但是对于自然场景图像中的文字提取与识别,仍然面临诸多挑战,如噪声、模糊和失真等。为此,本文针对现存的问题进行了深入的研究,取得如下创新性成果。针对自然场景文本检测,本文提出基于多通道多尺度检测最大稳定极值区域与由粗到细的级联过滤的文本定位方法。首先,本文选取合适的颜色通道和尺度提取最大稳定极值区域作为字符候选区域。然后,设计一个由粗到细的级联过滤器用于去除误检,粗过滤器基于一些简单的形态学特征和笔画宽度特征,细过滤器由二分类卷积神经网络训练得到。最后,对剩余的字符候选区域,通过图模型合并成水平或者多方向字符串。本文提出的方法在ICDAR2013数据集Challeng2以及多方向自然场景数据集USTB-SV1K上进行了测试,实验结果显示本文方法快速而有效。F-score在ICDAR2013达到了83.84%,在更具挑战的USTB-SV1K数据库上达到了51.15%,性能优于当前流行的自然场景文本检测算法。针对自然场景文本识别,基于深度学习技术的发展,我们将文本识别转化成序列标注任务,提出基于上下文内容的隐分割自然场景文本识别方法。首先,对输入图像进行预处理,使之符合网络结构。然后,利用卷积神经网络提取单词图像的序列特征。之后,利用循环神经网络中的双向长短时记忆网络对序列特征进行处理输出预测结果。最后,利用时域连接模型对预测结果进行转录,得到最终的识别结果。本方法在ICDAR2013的Challenge1、2和4上进行了测试,实验结果表明本方法具有良好的识别效果以及较快的识别速度。基于本文提出的自然场景文本检测算法和自然场景文本识别算法,我们可以得到端对端的自然场景文本检测与识别系统。此外,针对多方向文本,我们通过文本检测提取其方向,然后加以倾斜矫正,可以有效的提升其识别率。由于单词比字符包含更高的语义信息,我们将文本识别算法与文本定位方法的结果结合,用于提升文本定位的准确性。

【Abstract】 Text,as one of the most influential inventions of humanity,has played an important role in human life,so far from ancient times.The rich and precise semantic information carried by text is very useful in a wide range of vision-based applications,therefore text detection and recognition in natural scenes has become more and more important.It becomes an active research topic in computer vision and document analysis.Especially in recent years,a surge of research efforts and substantial progresses have been developed in these fields,though there are a variety of challenges(e.g.noise,blur,distortion,occlusion and variation etc.).For this purpose,we study those problems in scene text detection and recognition and obtain the innovative achievements as follows.For scene text detection,we proposed a Multi-Channel and Multi-Resolution Maximally Stable Extremal Regions(MC-MR MSER)based candidate extraction and coarse-to-fine filtering method to detect text in scene images.First,we extract MSERs as text candidates with a proper MC-MR MSER strategy.Then,we design a coarse-to-fine character classifier to discard false-positive candidates,where the coarse filter is based on morphological features as well as stroke width,and the fine filter is well-trained by convolutional neural network(CNN).Finally,horizontal and multi-direction text strings are formed with a graph model on remaining characters.The proposed method is evaluated on ICDAR2013 Robust Reading Competition benchmark database Challenge2 and the practical challenging multi-orientation scene text database(USTB-SV1K)with standard rules.Experimental results show our method is efficient and effective.It achieves F-Score at 83.84% on ICDAR 2013 database and 51.15% on the more challenging USTB-SV1 K database,which are superior over several state-of-the-art text detection methods.In order to solve the problems of text recognition in scene images,we propose a new text recognition method based on context of latent segmentation through transforming the text recognition into the task of sequence labeling based on the development of deep learning technologies.First,the input image is pre-processed to conform the network structure.Then,we leverage CNN to generate the sequential feature of the whole word image.After that,a Bi-directional long short-term memory(Bi-LSTM)recurrent model is developed to robustly predict the generated feature sequences.Finally,we transform the prediction to word with connectionist temporal classification(CTC).We evaluate our algorithm on ICDAR2013 challenge 1,2 and 4.The experimental results show the proposed method has a high recognition performance and speed.Based on the natural scene text detection and recognition algorithms,we proposed an end-to-end scene text detection and recognition system.In addition,for multi-direction text,we extract the direction of the text,and then tilt correction the text with its orientation,which effectively improves the recognition rate.Since the word contains more semantic information than the character,we combine the text recognition algorithm with the result of text location method to improve the accuracy of text localization.

  • 【分类号】TP391.41;TP181
  • 【被引频次】14
  • 【下载频次】522
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络