节点文献
基于图文编辑的德语场景文本识别数据集的自动生成
Automatic generation of German scene text recognition dataset based on scene text editing
【Author】 Haodong Guo;Zengfu Wang;Institutes of Physical Science and Information Technology,Anhui University;Hefei Institute of Intelligent Machines,Chinese Academy of Sciences;Department of Automation,University of Science and Technology of China;
【机构】 安徽大学物质科学与信息技术研究院; 中国科学院合肥智能机械研究所; 中国科学技术大学自动化系;
【摘要】 在场景文字识别(Scene Text Recognition,STR)领域,英语作为一种全球通用语言,拥有大量经过标注的真实数据集。然而,对于像德语这样在非德语母语国家使用较少的通用语言而言,获取数据集变得困难。此外,手工准备数据集的过程也需要消耗大量时间和精力。为了解决上述问题,本文采用了一种基于场景文本编辑(Scene Text Editing,STE)的方法,即替换图像中的文本。通过这种方法,现有的英语数据集可以被转换为德语数据集。这种方法可以在短时间内获得大量模拟的德语数据集,从而节省了人工成本。与以往的合成文本图像生成器生成的德语文本图片相比,这种模拟的图片更接近真实的场景文本图片,使得训练集更贴近实际应用情境,有助于提升模型的识别能力。
【Abstract】 In the field of scene text recognition(STR), there is a wealth of annotated real-world datasets available for widely-used global languages, such as English. However, for less commonly used languages like German, acquiring datasets can be challenging, particularly in non-German-speaking countries. Additionally, manually curating datasets requires significant time and effort. To address these issues, this study adopts a method based on scene text editing(STE), specifically text replacement within images. Through this approach, existing English datasets can be transformed into German datasets. This allows for the generation of a large number of simulated German datasets in a short period, thereby reducing the burden of manual data preparation. In comparison to synthetic text image generators used in the past for generating German text images, this simulated imagery closely approximates real-world scene text images. As a result, the training dataset becomes more aligned with actual usage scenarios, ultimately enhancing the recognition capabilities of the model.
【Key words】 Scene Text Recognition; Scene Text Editing; Synthetic Text Image Generator; Dataset Generation;
- 【会议录名称】 2023中国自动化大会论文集
- 【会议名称】2023中国自动化大会
- 【会议时间】2023-11-17
- 【会议地点】中国重庆
- 【分类号】TP391.41
- 【主办单位】中国自动化学会