节点文献

基于多尺度时空卷积的唇语识别方法

Lipreading Method Based on Multi-Scale Spatiotemporal Convolution

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 叶鸿危劲松贾兆红郑辉梁栋唐俊

【Author】 YE Hong;WEI Jinsong;JIA Zhaohong;ZHENG Hui;LIANG Dong;TANG Jun;School of Internet, Anhui University;School of Electronic and Information Engineering, Anhui University;

【通讯作者】 郑辉;

【机构】 安徽大学互联网学院安徽大学电子信息工程学院

【摘要】 现有的唇语识别模型大多采用将单层的3维卷积与2维卷积神经网络结合的方式,从唇语视频序列中挖掘出时空联合特征。然而,由于单层的3维卷积不能很好地提取时间信息,同时2维卷积神经网络对细粒度的唇语特征的挖掘能力有限,该文提出一种多尺度唇语识别网络(MS-LipNet)以改善唇语识别任务。该文在Res2Net网络中,采用3维时空卷积替代传统的2维卷积以更好地提取时空联合特征,同时提出时空坐标注意力模块,使网络关注于任务相关的重要区域特征。在LRW和LRW-1000数据集上进行实验,验证了所提方法的有效性。

【Abstract】 Most of the existing lipreading models use a combination of single-layer 3D convolution and 2D convolutional neural networks to extract spatio-temporal joint features from lip video sequences. However, due to the limitations of single-layer 3D convolutions in capturing temporal information and the restricted capability of 2D convolutional neural networks in exploring fine-grained lipreading features, a Multi-Scale Lipreading Network(MS-LipNet) is proposed to improve lip reading tasks. In this paper, 3D spatio-temporal convolution is used to replace traditional two-dimensional convolution in Res2Net network to better extract spatio-temporal joint features, and a spatio-temporal coordinate attention module is proposed to make the network focus on task-related important regional features. The effectiveness of the proposed method was verified through experiments conducted on the LRW and LRW-1000 datasets.

【基金】 国家自然科学基金(71971002,62273001);安徽省自然科学基金(2108085QA35);安徽省重点研究与开发计划(202004a07020050);安徽省科技重大专项(202003A06020016);安徽省高校优秀科研创新团队(2022AH010005)~~
  • 【文献出处】 电子与信息学报 ,Journal of Electronics & Information Technology , 编辑部邮箱 ,2024年11期
  • 【分类号】TP391.41;TN912.34
  • 【下载频次】83
节点文献中: 

本文链接的文献网络图示:

本文的引文网络