节点文献

基于远程监督的多因子人物关系抽取模型

Multi-factor person entity relation extraction model based on distant supervision

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 黄杨琛贾焰甘亮徐菁黄九鸣赫中翮

【Author】 HUANG Yangchen;JIA Yan;GAN Liang;XU Jing;HUANG Jiuming;HE Zhonghe;College of Computer, National University of Defense Technology;KB R&D department, Hunan Singhand Intelligent Data Technology Co., Ltd.;

【通讯作者】 黄杨琛;

【机构】 国防科技大学计算机学院湖南星汉数智科技有限公司知识图谱研发部

【摘要】 针对远程监督的基本假设过强容易引入噪声数据的问题,提出了一种可以对远程监督自动生成的训练数据去噪的人物实体关系抽取模型。在训练数据生成阶段,通过多示例学习的思想和基于TF-IDF的关系指示词发现的方法对远程监督产生的数据进行去噪处理,使训练数据达到人工标注质量。在模型分类器中,提出采用词法特征和句法特征相结合的多因子特征作为关系特征向量用于分类器的学习。在大规模真实数据集上的实验结果表明,所提模型结果优于同类型的关系抽取方法。

【Abstract】 Aiming at the problem that the basic assumption of distant supervision was too strong and easy to produce noise data, a model of the person entity relation extraction which could automatically filter the training data generated by distant supervision was proposed. For training data generation, the data produced by distant supervision would be filtered by multiple instance learning and the method of TF-IDF-based relation keyword detecting, which tried to make the training data has the manual annotation quality. Furthermore, the model combined lexical and syntactic features to extract the effective relation feature vector from two angles of words and semantics for classifier. The experiment results on large scale real-world datasets show that the proposed model outperforms other relation extraction methods which based on distant supervision.

【基金】 国家重点研究发展计划基金资助项目(No.2016QY03D0601,No.2016QY03D0603);国家自然科学基金资助项目(No.61502517);湖南省重点研发计划基金资助项目(No.2018GK2056)~~
  • 【文献出处】 通信学报 ,Journal on Communications , 编辑部邮箱 ,2018年07期
  • 【分类号】TP391.1
  • 【被引频次】24
  • 【下载频次】387
节点文献中: