节点文献

一种基于词汇链的关键词抽取方法

A Keyword Selection Method Based on Lexical Chains

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 索红光刘玉树曹淑英

【Author】 SUO Hong-guang1,2,LIU Yu-shu1,CAO Shu-ying2(1.School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China;2.School of Computer and Communication Engineering,China University of Petroleum,Dongying,Shandong 257061,China)

【机构】 北京理工大学计算机科学技术学院中国石油大学计算机与通信工程学院 北京100081中国石油大学计算机与通信工程学院山东东营257061北京100081

【摘要】 关键词在文献检索、自动文摘、文本聚类/分类等方面有十分重要的作用。词汇链是由一系列词义相关的词语组成,最初被用于分析文本的结构。本文提出了利用词汇链进行中文文本关键词自动标引的方法,并给出了利用《知网》为知识库构建词汇链的算法。通过计算词义相似度首先构建词汇链,然后结合词频与区域特征进行关键词选择。该方法考虑了词汇之间的语义信息,能够改善关键词标引的性能。实验结果表明,与单纯的词频、区域方法相比,召回率提高了7.78%,准确率提高了9.33%。

【Abstract】 Keywords are very useful for information retrieval,automatic summarizing,text clustering/classificationand so on.Alexical chain is a series of related words and primarily used in text structure analyzing.The paper propo-ses a lexical-chain-based keywords indexing method for Chinese texts.And,an algorithm for constructing lexicalchains based on HowNet knowledge database is given.In the method,lexical chains are firstly constructed by calcu-lating the semantic similarity between terms,then keywords are selected through taking account of term frequency andarea.The experimental results shows that the performance of the system has a notable improvement by considering se-mantic relationship between terms,and the precision can be improved by9.33 percent and the recall can be improvedby 7.78 percent compared with term frequency and area.

【基金】 国家自然科学基金资助项目(60503050)
  • 【文献出处】 中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2006年06期
  • 【分类号】TP391.1;TP18
  • 【被引频次】214
  • 【下载频次】1560
节点文献中: 

本文链接的文献网络图示:

本文的引文网络