节点文献
基于hownet概念获取的中文自动文摘系统
Automatic Chinese summarization system based on conceptual vector space model
【Author】 Wang Meng He ting-ting wang xiao-rong(1 Central China Normal University Department of Computer Science, Wuhan 430079 )
【机构】 华中师范大学计算机科学系;
【摘要】 本文首先基于hownet对多义词进行无导词语消歧处理,然后利用hownet建立的知识库获取文章中词语的概念。并对未登录词语进行概念标注。用概念统计代替传统的词形频率统计方法。选择出与主题相关的若干概念,建立主题概念向量空间模型;通过段落聚类的算法计算出段落重要度;利用段落重要度和主题概念向量空间模型计算出句子重要度,抽取文摘句。通过对抽取出的语句进行句子相似度的计算来提高文摘精确度,设计并实现了一个中文自动文摘系统。
【Abstract】 The paper presents a novel approach to Chinese summarization. The novelty lies in disambiguating sense of multivocal words using the approach of unsuptervised word sense disambiguation based on hownet firstly, and capturing concept of words using knowledge base, which is established by hownet. The untagged word will be tagged a concept at the same time. Morphology statistical approach is replaced by word sense, and thematic conceptual vector space model that is based on choosing some relational thematic concept is established. The weight of paragraph will be carried out by arithmetic of paragraph clustering. Weight of sentence can be carried out in terms of weight of paragraph and thematic conceptual vector space model, after the weight of all the sentences have been carried out, the weights are ordering according to their magnitude. Sentences with high weight are selected as summarization sentences. An effective automatic Chinese summarization system is developed, by using computation of sentence similarity to improve precision of summarization.
【Key words】 hownet; automatic summarization; conceptual vector space model; concept obtain;
- 【会议录名称】 第二届全国学生计算语言学研讨会论文集
- 【会议名称】第二届全国学生计算语言学研讨会
- 【会议时间】2004-08
- 【分类号】TP391.1
- 【主办单位】中国中文信息学会