节点文献

基于百科词典的知识获取系统的研究与实现

An Experimental System of Encyclopedia Based Text Knowledge Acquisition

【作者】 许勇

【导师】 宋柔;

【作者基本信息】 北京工业大学 , 计算机应用, 2001, 硕士

【摘要】 从各种自然语言文本中获取知识是自然语言处理技术的重要应用,能有效地帮助人们搜索、获取知识,具有较大的应用前景。从文本中获取知识必须限制文本的范围。词典文本具有知识密集、较有规律性的特点,因此把机器获取知识的范围限制在词典文本是比较自然、有效的方式。国内外这方面的研究都有所开展,但整体上处于探索性阶段。本文描述了从百科辞典中获取知识的探索性的研究工作。本文首先介绍了与文本知识获取关系密切的、较活跃的文本信息提取技术的研究情况,以及它和文本知识提取的关系。本文从信息提取的角度出发研究知识获取任务,实现了基于《中国大百科全书》的、限定范围内的试验性百科词典知识获取系统。具体工作包括:利用分词工具进行了初步的词条分类;在词条分类的基础上,对处理范围内的词条文本进行观察,以人工方式获取其中目标知识的基于语义特征的模式规则;利用YACC具对模式规则进行语法分析,进而抽取目标知识。目前,知识获取的词条范围包括《中国地理》卷目中行政地名词条和《美术》卷目中西方美术家词条。文中给出了试验结果及分析。试验表明,在处理范围不大,目标知识项目不多且不太复杂的情况下这种方法能取得比较好的性能。但文本知识获取总的来说是一项比较困难的研究,本研究中实现的系统还有待进一步提高改善。

【Abstract】 The acquisition of knowledge from natural language text is a very important application of NLP. This technique has great prospect because it can help people search and acquire knowledge efficiently.Due to the limitation of current technology, to acquire knowledge from natural language text using computer, the text must be restricted In some way, such as content domain or style. Encyclopedia-text is very appropriate to be used as base corpora of knowledge-acquisition, because it has features of knowledge denseness and relative regularity of content expression.This paper describes an experimental system of acquiring knowledge from encyclopedia-text. The encyclopedia used in this system is ncyclopedia of china? and currently the system processed two kind of items: Chinese district item from hina Geography?volume, foreign artist item from rt?volume.To begin with, this paper introduced the research development of the Information-Extraction, a research area very close to text knowledge acquisition. In this paper, the task of acquiring knowledge from text was dealt in view of Information Extraction. The following part of the paper Introduced the Experimental Encyclopedia-based Knowledge Acquisition System in detail. The system consisted of three main modules: Encyclopedia-item classification module(used Chinese word segmentation tool), item-text analysis and knowledge-extraction module, query module. The domain extraction rules are semantic-feature based, and these were acquired by hand. YACC tool was adopted to analyze the item text. In the last, the testing result and it analysis were presented. Result shows that the system achieved high performance in close test. However, building high performance text knowledge acquisition system is still a difficult task and needs more elaborate study.

  • 【分类号】TP391.1
  • 【被引频次】4
  • 【下载频次】224
节点文献中: 

本文链接的文献网络图示:

本文的引文网络