节点文献

基于Transformer的报纸版面分割方法研究

Research on Newspaper Layout Segmentation Method Based on Transformer

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 朱一凡高华业宁

【Author】 Zhu Yifan;Gao Hua;Ye Ning;College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University;

【通讯作者】 业宁;

【机构】 南京林业大学信息科学技术学院,人工智能学院

【摘要】 大数据背景下信息的检索与研究对海量传统纸媒的数字化提出了挑战,得益于不断发展的计算机视觉与人工智能方法,DETR模型可被应用于报纸版面分割.针对原模型在版面分割中存在的检测速度慢、参数量大及分类不精准等问题,本文提出了采用ShuffleNet V2轻量级主干网络的改进模型,该方法可有效提升计算效率并减少模型参数量,从而缓解Transformer结构的计算压力.同时,通过特征金字塔结构,该模型能够充分融合全局信息及细节信息,显著增强多尺度目标的识别能力.此外,该模型还引入高效通道注意力(ECA)模块来提取关键目标特征,以此有效抑制无关背景信息,在保证分割性能的同时实现轻量化设计.实验结果表明,改进模型在报纸版面分割任务中的参数量为38.5 M,帧率(FPS)高达47.5 img/s, mAP0.5达到了0.806.与原DETR模型相比,改进模型在参数量上减少了2.8 M,帧率提高了28.3 img/s, mAP0.5提升了3.2%.本文提出的模型还可以为报纸版面的OCR识别提供前期技术支持.

【Abstract】 The retrieval and research of information in the context of big data poses a challenge to the digitalization of massive traditional paper media. Thanks to the continuous development of computer vision and artificial intelligence methods, DETR model can be applied to newspaper layout segmentation. In view of the problems existing in the original model in layout segmentation, such as slow detection speed, large number of parameters and inaccurate classification, this paper proposes an improved model using ShuffleNet V2 lightweight backbone network, which can effectively improve computing efficiency and reduce the number of model parameters, thus easing the computing pressure of Transformer structure. At the same time, through the feature pyramid structure, the model can fully integrate the global information and detail information, and significantly enhance the recognition ability of multi-scale targets. In addition, the model also introduces Efficient Channel Attention(EAC)module to extract key target features to effectively suppress irrelevant background information and achieve lightweight design while ensuring segmentation performance. The experimental results show that the parameter number of the improved model is 38.5 M,the frame rate(FPS)is up to 47.5 img/s, and the mAP0.5 is up to 0.806. Compared with the original DETR model, the improved model reduces the number of parameters by 2.8 M,increases the frame rate by 28.3 img/s and improves mAP0.5 by 3.2%. The model proposed in this paper can provide early technical support for OCR recognition of newspaper layout.

【基金】 国家重点研发计划项目(2016YFD0600101)
  • 【文献出处】 南京师大学报(自然科学版) ,Journal of Nanjing Normal University(Natural Science Edition) , 编辑部邮箱 ,2025年01期
  • 【分类号】TP391.41;TS892
  • 【下载频次】29
节点文献中: