节点文献
基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略
English Base Noun Phrase Identification Based on Hybrid Strategy-- The Strategy of Combination of Boundary Statistic and the Amendment of the String of Part of Speech
【摘要】 基本名词短语识别是自然语言处理领域非常重要的子任务。文中总结了一些有代表性的基本名词短语识别方法,并对多种典型英语基本名词短语识别的结果进行了比较和对照,提出并实现了边界统计和词性串校正相结合的英语基本名词短语识别方法。该方法把基本名词短语识别分成主次分明的两部分,边界统计作为主要部分能够正确识别出大部分基本名词短语,词性串规则作为辅助手段在对前者识别出的基本名词短语进行核对和校正的同时还对边界统计方法遗漏的基本名词短语进行再回收。此方法中,词性串规则弥补了边界统计无法顾及基本名词短语内部组合规律的缺点,提高了精确率和召回率。采用此方法,基本名词短语识别的精确率达到96.22%,召回率97.59%,Fβ=196.90%,F值超出了目前报道的最好结果。
【Abstract】 Base noun phrase identification is an important sub -task in natural language processing.Representative methods of base noun phrase identification are summarized in this paper,whose results are compared and analyzed.A novel method of base noun phrase identification is proposed which combines boundary statistic and the amendment by the string of part of speech.The method divides the base noun phrase identification task into two parts.As the primary part,boundary statistic method can correctly identify most of the base noun phrases.The rules serve as the secondary part,which is composed of a string of part of speech tags.The rules make amendments to the base noun phrase identified by the primary part,at the same time recycle the base noun phrases which are neglected by the primary part,thus enhancing both the precision and recall.The secondary part of the method remedies the primary part by taking into account the interior constitution of base noun phrase.The method reaches a precision of96.22%and recall of97.59%in English base noun phrase identification,whose F β=1 reaches96.90%.Compared to other method the method achieves the highest F score.
【Key words】 base noun phrase; chunk; boundary statistic; bunches of part of speech;
- 【文献出处】 计算机工程与应用 ,Computer Engineering and Applications , 编辑部邮箱 ,2004年35期
- 【分类号】TP391.1
- 【被引频次】11
- 【下载频次】202