节点文献

一种视频语义结构信息辅助的弱监督时序动作定位方法

Weakly Supervised Temporal Action Localization with Video Semantic Structure Information

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 孔德慧许梦文李敬华王少帆尹宝才

【Author】 Dehui Kong;Mengwen Xu;Jinghua Li;Shaofan Wang;Baocai Yin;Beijing Institute of Artificial Intelligence,Beijing Key Laboratory of Multimedia and Intelligent Software Technology,Faculty of Information Technology,Beijing University of Technology;

【机构】 北京工业大学北京人工智能研究院多媒体与智能软件技术北京市重点实验室

【摘要】 弱监督时序动作定位任务的目标是在只有视频级标签的情况下,对未分割的视频中的动作进行分类和时序上的定位。目前基于神经网络模型的方法,大多训练分类器以预测视频片段级的类别分数,再融合其为视频级的类别分数。这些方法只关注视频的视觉特征,却忽视了视频语义结构信息。为进一步提升视频动作定位的质量,本文提出了一种视频语义结构信息辅助的弱监督时序动作定位方法。该方法首先以分类模块作为基础模型,然后基于视频在时序结构上的稀疏性和语义连续性等辅助信息设计一种平滑注意力模块,修正分类结果;另外,加入视频片段级语义标签预测模块,改善弱监督标签信息不充足问题;最后将三个模块共同训练以融合提升时序动作定位的精度。通过在THUMOS14和ActivityNet数据集上的实验,表明本文方法的性能指标明显优于目前现有方法。

【Abstract】 The problem of weakly-supervised temporal action localization is to locate and classify the actions in the untrimmed long video under the condition of only video-level labels. At present, the commonly used methods based on neural network models mostly train classifiers to predict video segment-level class scores, and then merge them into video-level class scores. These methods only focus on the visual features of the video, but ignore the semantic structure information of the video. In this paper, In order to further improve the quality of video action location, we proposes a weakly-supervised temporal action localization method assisted by video semantic structure information. This method first uses the video segment classification module as the basic model, and then designs a smooth attention module based on auxiliary information such as the sparseness and semantic continuity of the video in the temporal structure to correct the classification results; in addition, add the video segment-level semantic label prediction Module to improve the insufficient information of weakly-supervised tags; finally, the three modules are jointly trained to integrate and improve the accuracy of timing action Localization. Experiments conducted on the THUMOS14 and ActivityNet datasets show that the proposed approach outperforms current state-of-the-art methods.

【基金】 孔德慧国家自然科学基金(61772049);北京市自然科学基金(4202003)
  • 【会议录名称】 2021中国自动化大会论文集
  • 【会议名称】2021中国自动化大会——中国自动化学会60周年会庆暨纪念钱学森诞辰110周年
  • 【会议时间】2021-10-22
  • 【会议地点】中国北京
  • 【分类号】TP391.41
  • 【主办单位】中国自动化学会
节点文献中: 

本文链接的文献网络图示:

本文的引文网络