节点文献

基于混合策略的英语基本名词短语识别——边界统计和词性串规则校正相结合的策略

English Base Noun Phrase Identification Based on Hybrid Strategy-- The Strategy of Combination of Boundary Statistic and the Amendment of the String of Part of Speech

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 梁颖红赵铁军姚建民于浩徐冰

【Author】 Liang Yinghong 1,2 Zhao Tiejun 2 Yao Jianmin 2 Yu Hao 2 Xu Bing 21 (Information and Computer Engineering Department of North East Forestry University,Harbin150001) 2 (Computer Science and Technology Department of Harbin Institute of Technology,Harbin150001)

【机构】 东北林业大学信息与计算机工程学院哈尔滨工业大学计算机科学与技术学院哈尔滨工业大学计算机科学与技术学院 哈尔滨150001哈尔滨工业大学计算机科学与技术学院哈尔滨150001哈尔滨150001

【摘要】 基本名词短语识别是自然语言处理领域非常重要的子任务。文中总结了一些有代表性的基本名词短语识别方法,并对多种典型英语基本名词短语识别的结果进行了比较和对照,提出并实现了边界统计和词性串校正相结合的英语基本名词短语识别方法。该方法把基本名词短语识别分成主次分明的两部分,边界统计作为主要部分能够正确识别出大部分基本名词短语,词性串规则作为辅助手段在对前者识别出的基本名词短语进行核对和校正的同时还对边界统计方法遗漏的基本名词短语进行再回收。此方法中,词性串规则弥补了边界统计无法顾及基本名词短语内部组合规律的缺点,提高了精确率和召回率。采用此方法,基本名词短语识别的精确率达到96.22%,召回率97.59%,Fβ=196.90%,F值超出了目前报道的最好结果。

【Abstract】 Base noun phrase identification is an important sub -task in natural language processing.Representative methods of base noun phrase identification are summarized in this paper,whose results are compared and analyzed.A novel method of base noun phrase identification is proposed which combines boundary statistic and the amendment by the string of part of speech.The method divides the base noun phrase identification task into two parts.As the primary part,boundary statistic method can correctly identify most of the base noun phrases.The rules serve as the secondary part,which is composed of a string of part of speech tags.The rules make amendments to the base noun phrase identified by the primary part,at the same time recycle the base noun phrases which are neglected by the primary part,thus enhancing both the precision and recall.The secondary part of the method remedies the primary part by taking into account the interior constitution of base noun phrase.The method reaches a precision of96.22%and recall of97.59%in English base noun phrase identification,whose F β=1 reaches96.90%.Compared to other method the method achieves the highest F score.

【基金】 国家自然科学基金(编号:60302021,60375019);国家863高技术研究发展计划项目(子课题)(编号:2002AA117010-09);科技部政府间国际合作项目(编号:CI-2003-03)资助
  • 【文献出处】 计算机工程与应用 ,Computer Engineering and Applications , 编辑部邮箱 ,2004年35期
  • 【分类号】TP391.1
  • 【被引频次】11
  • 【下载频次】202
节点文献中: 

本文链接的文献网络图示:

本文的引文网络