节点文献

融合RoBERTa-WWM和全局指针网络的农业病害实体关系联合抽取研究

Joint extraction of agricultural disease entity and relations by combining RoBERTA-WWM and global pointer

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 王彤张立杰王铭吴华瑞朱华吉杨英茹王春山

【Author】 WANG Tong;ZHANG Lijie;WANG Ming;WU Huarui;ZHU Huaji;YANG Yingru;WANG Chunshan;College of Information Science and Technology, Hebei Agricultural University;National Engineering Research Center for Information Technology in Agriculture;College of Mechanical and Electrical Engineering, Hebei Agricultural University;Hebei Education Examinations Authority;Shijiazhuang Academy of Agriculture and Forestry Sciences;

【通讯作者】 王春山;

【机构】 河北农业大学信息科学与技术学院国家农业信息化工程技术研究中心河北农业大学机电工程学院河北省教育考试院石家庄市农林科学研究院

【摘要】 针对实体和关系抽取过程中存在的一词多义、实体嵌套、三元组重叠的问题,本文提出了1种融合RoBERTa-WWM和全局指针网络的联合抽取模型RBGPL。该模型引入RoBERTa-WWM预训练模型,利用上下文的语境信息融合克服了不同语境下一词多义问题;采用全局指针网络Global pointer标注方式解决了实体嵌套问题;通过全局指针联合解码模型将三重抽取转变为五重提取,解决了三元组重叠问题。在自建农业病害数据集上,模型RBGPL的精确率、召回率、F1值达到76.23%,91.18%,83.04%,与其他联合抽取模型相对比F1值均取最优,有效地克服了一词多义问题和三元组重叠问题。此外,在病原(Pathogeny)和作物名称(Crop)2种易嵌套实体的F1值上提升了3%和18%,实体嵌套得到了显著缓解。本文方法提高了中文农业病害领域实体关系抽取性能,可为农业病害领域知识图谱的构建提供技术支持。

【Abstract】 Aiming at the problems of polysemy, entity nesting, and triple overlap existing in the process of entity and relation extraction, this paper proposesd a joint extraction model RBGPL that integrates RoBERTa-WWM and Global Pointer network. Firstly, the RoBERTA WWM pre-training model is introduced to overcome the problem of polysemy in different contexts by using context information fusion. Secondly, the global pointer network Global Pointer annotation method was used to solve the problem of entity nesting. Finally, the triple extraction is transformed into the quintuple extraction through the global pointer joint decoding model, which solves the problem of triple overlap. When ran on the self built agricultural disease data set, the accuracy, recall and F1 values of the model RBGPL reached 76.23%, 91.18% and 83.04%, which were the best compared with other joint extraction models, and effectively overcame the problem of polysemy and triple overlap. In addition, F1 values of pathogen and crop easily nested entities increased by 3% and 18%, and entity nesting was significantly alleviated. This method improved the performance of Chinese agricultural disease domain entity relationship extraction, and can provide technical support for the construction of agricultural disease domain knowledge map.

【基金】 河北省自然基金项目(F2022204004);国家大宗蔬菜产业技术体系项目(CARS-23-D07);国家重点研发计划项目(2020YFD1100204)
  • 【文献出处】 河北农业大学学报 ,Journal of Hebei Agricultural University , 编辑部邮箱 ,2024年03期
  • 【分类号】TP391.1;S432
  • 【下载频次】56
节点文献中: 

本文链接的文献网络图示:

本文的引文网络