节点文献

基于机器学习的自然图像中文本检测及多文种辨识方法研究

Research on Text Detection and Multi-script Identification in Natural Images Based on Machine Learning

【作者】 张鹏

【导师】 崔荣一;

【作者基本信息】 延边大学 , 计算机应用技术, 2017, 硕士

【摘要】 文字在人类思想情感以及文化传承中是十分重要的符号工具,在社会生产生活的各个方面都体现出了文字的重要性与不可替代性。在现代城市环境中,文字是普遍存在的元素,如海报、道路标志、牌匾灯箱等,其中不乏大量的文字信息。在自然图像中,文字所表达的语义信息是理解图像内容时一个很重要的参考信息。自然图像中的文种辨识是基于内容的图像检索和多语种系统开发领域的一个重要方向。在自然图像场景中文字的检测及其文种的辨识有相当大的难度:不同自然场景中的文字含有不同的特性,例如颜色不同、数量不一、大小与间隔不同等;而且在自然图像中,文字的背景往往很复杂,同时存在着诸如噪声、倾斜和透视变换等各种问题。这些都对自然图像中的文字检测和文种辨识工作带来了极大的困难。如何有效地对包含有多种语言文字的自然图像进行处理成为自然场景分析与理解中亟待解决的难题。本学位论文提出了一种基于视觉显著性和边缘密集度的文本区域检测方法以及基于图像特征和机器学习方法的文种辨识方法。首先,提出了基于视觉显著性和边缘密集度的文本区域检测方法。该文本区域检测方法通过多尺度谱残差方法来检测视觉显著性区域,接着在视觉显著性区域内使用Sobel算子来对图像进行检测边缘,通过计算图像的边缘密集度,再使用数学形态学方法对图像边缘进行预处理,最终通过自然图像中文字排列的先验知识来检测文本区域。其次,提出了基于基本图像特征与机器学习方法的文种辨识方法。该方法对阿拉伯数字、英文、俄文、日文假名、简体中文和朝鲜文构建了文字样本图像并提取其骨架,利用该骨架的基本图像特征构造相应文种的特征集,并根据不同文种的结构特征,结合分类方法的特性,将文种辨识分为两个阶段.·粗分类阶段和细分类阶段。在粗分类阶段,使用支持向量机将文字划分为两大类,第一类中包含阿拉伯数字、英文、俄文和日文假名,第二类中包含简体中文和朝鲜文。在辨识阶段,使用支持向量机对第一类文字进行文种辨识,使用BP神经网络对第二类文字进行辨识。实验结果表明,本文提出的基于视觉显著性与文字边缘密集度的文本检测方法得到了 73%的检测率,基于基本图像特征与机器学习方法的文种辨识方法得到了 73.33%的辨识率,解决了自然图像中的文本检测与文种辨识问题,同时也验证了本学位论文所提出方法的正确性与可行性。

【Abstract】 Text is an important carrier of human emotion and cultural heritage,it plays very important role in many aspects of human production and life.Text is a common element in the modern urban environment,such as posters,signs,billboards,etc.,which contain a large number of text.The text in natural scene can convey rich and accurate high-level semantic information,and it is the key element to understand the content of the scene.The script identification of text in natural images is an important direction of content-based image retrieval and multi-lingual system development.In a natural image,the detection and identification of text is very challenging:on the one hand,the text in a natural scene possessed rich diversity,it may various in color,number,size and space,and may belong to different languages;on the other hand,the background of natural scene is complex and there are some problems such as noise,occlusion and perspective,various factors have brought huge difficulty to the detection and identification of multi-lingual text.In now days,how to effectively deal with the natural images containing several kinds of scripts is an urgent problem to be solved.To solve this problem,text localization algorithm based on visual saliency and edge density and script identification algorithm based on basic image features and machine learning were proposed in this dissertation.Firstly,a text detection algorithm was proposed,combining the visual saliency and edge density.In proposed algorithm,multi scale spectral residual method was used to detect visual saliency regions,and Sobel gradient operator was employed to detect image edge in the saliency regions and then the edge density was obtained.After the preprocessing of image edge by morphological method,text areas were detected by means of prior hypotheses for text arrangement.Secondly,a multi-script identification method based on basic image features and machine learning was put forward.This method made the character sample set of Arabic numerals,English,Russian,Japanese Kana,simplified Chinese and Korean,and furthermore,the skeleton of letters was extracted.According to the structural characteristics of different languages,combined with the features of classifier,the language identification was divided into two stages:coarse classification and fine classification stage.In the coarse classification stage,text was divided into two categories using support vector machine.The first category includes Arabia numerals,English,Russian and Japanese kana.The second category includes Chinese and Korean.In the fine classification stage,support vector machine was used to identify the first class,and BP neural network was used to identify the second class.The experimental results show that the text detection algorithm achieves 73%precision and the multi-script identification algorithm achieves 73.33%precision,which proves the method proposed in this dissertation is effective and feasible.

  • 【网络出版投稿人】 延边大学
  • 【网络出版年期】2018年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络