节点文献

知识驱动的多模态语义理解研究综述

A Survey on Knowledge-Driven Multimodal Semantic Understanding

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 郑祎豪郭奕君毋立芳黄岩

【Author】 ZHENG Yihao;GUO Yijun;WU Lifang;HUANG Yan;Faculty of Information Technology, Beijing University of Technology;Center for Research on Intelligent Perception and Computing,Institute of Automation, Chinese Academy of Sciences;State Key Laboratory for Multi-modal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences;

【通讯作者】 黄岩;

【机构】 北京工业大学信息学部中国科学院自动化研究所智能感知与计算研究中心中国科学院自动化研究所多模态人工智能系统全国重点实验室

【摘要】 基于深度学习模型的多模态学习方法已在静态、可控等简单场景下取得较优的语义理解性能,但在动态、开放等复杂场景下的泛化性仍然较低.近期已有不少研究工作尝试将类人知识引入多模态语义理解方法中,并取得不错效果.为了更深入了解当前知识驱动的多模态语义理解研究进展,文中在对相关方法进行系统调研与分析的基础上,归纳总结关系型和对齐型这两类主要的多模态知识表示框架.然后选择多个代表性应用进行具体介绍,包括图文匹配、目标检测、语义分割、视觉-语言导航等.此外,文中总结当前相关方法的优缺点并展望未来可能的发展趋势.

【Abstract】 Multimodal learning methods based on deep learning model achieve excellent semantic understanding performance in static, controllable and simple scenarios. However, their generalization ability in dynamic, open and other complex scenarios is still unsatisfactory. Human-like knowledge is introduced into multimodal semantic understanding methods in recent research, yielding impressive results. To gain deeper understanding of the current research progress in knowledge-driven multimodal semantic understanding, two main types of multimodal knowledge representation frameworks are summarized based on systematic investigation and analysis of relevant methods in this paper. The two main types of multimodal knowledge representation frameworks are relational and aligned, respectively. Several representative applications are discussed, including image-text matching, object detection, semantic segmentation, and vision-and-language navigation. In addition, the advantages and disadvan-tages of the current methods and the possible development trend in the future are concluded.

【基金】 科技创新2030-“新一代人工智能”重大项目(No.2018AAA0100400);国家自然科学基金项目(No.62236010,62276261)资助~~
  • 【文献出处】 模式识别与人工智能 ,Pattern Recognition and Artificial Intelligence , 编辑部邮箱 ,2023年12期
  • 【分类号】TP391.41;TP18
  • 【下载频次】12
节点文献中: 

本文链接的文献网络图示:

本文的引文网络