节点文献
基于远程监督的多因子人物关系抽取模型
Multi-factor person entity relation extraction model based on distant supervision
【摘要】 针对远程监督的基本假设过强容易引入噪声数据的问题,提出了一种可以对远程监督自动生成的训练数据去噪的人物实体关系抽取模型。在训练数据生成阶段,通过多示例学习的思想和基于TF-IDF的关系指示词发现的方法对远程监督产生的数据进行去噪处理,使训练数据达到人工标注质量。在模型分类器中,提出采用词法特征和句法特征相结合的多因子特征作为关系特征向量用于分类器的学习。在大规模真实数据集上的实验结果表明,所提模型结果优于同类型的关系抽取方法。
【Abstract】 Aiming at the problem that the basic assumption of distant supervision was too strong and easy to produce noise data, a model of the person entity relation extraction which could automatically filter the training data generated by distant supervision was proposed. For training data generation, the data produced by distant supervision would be filtered by multiple instance learning and the method of TF-IDF-based relation keyword detecting, which tried to make the training data has the manual annotation quality. Furthermore, the model combined lexical and syntactic features to extract the effective relation feature vector from two angles of words and semantics for classifier. The experiment results on large scale real-world datasets show that the proposed model outperforms other relation extraction methods which based on distant supervision.
【Key words】 relation extraction; person entity relation; distant supervision; machine learning; natural language processing;
- 【文献出处】 通信学报 ,Journal on Communications , 编辑部邮箱 ,2018年07期
- 【分类号】TP391.1
- 【被引频次】24
- 【下载频次】387