节点文献
基于多层次特征集成的中文实体指代识别
Chinese Entity Mention Detection Based on Multi-level Feature Integration
【摘要】 实体指代识别(Entity Mention Detection,EMD)是识别文本中对实体的指代(Mention)的任务,包括专名、普通名词、代词指代的识别。本文提出一种基于多层次特征集成的中文实体指代识别方法,利用条件随机场模型的特征集成能力,综合使用字符、拼音、词及词性、各类专名列表、频次统计等各层次特征提高识别性能。本文利用流水线框架,分三个阶段标注实体指代的各项信息。基于本方法的指代识别系统参加了2007年自动内容抽取(ACE07)中文EMD评测,系统的ACE Value值名列第二。
【Abstract】 The purpose of Entity Mention Detection(EMD) is to recognizel all mentions of entities in a document,involving recognition of named entities,noun words and pronoun coreference etc.In this paper,we propose an approach for Chinese entity mention detection by integrating multi-level features into the Conditional Random Fields(CRFs) framework.These features used include characters,phonetic symbols,lexical words and part-of-speech,named entities,and frequency statistics.All EMD subtasks are integrated into a three-stage pipeline framework in which three different CRFs classifiers are used to label different attributes sequentially in a predefined order.The system described here is the our submission to NIST ACE07 EMD Evaluation project,and achieved rank-2 performance in ACE07.
【Key words】 computer applicatiopn; Chinese information processing; entity mention detection; mutil-task labeling conditional random fields; ACE evaluation;
- 【文献出处】 中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2007年05期
- 【分类号】TP391.1
- 【被引频次】9
- 【下载频次】224