节点文献
利用知网进行(计算机)自然语言处理
【作者】 李力;
【导师】 杨国纬;
【作者基本信息】 电子科技大学 , 计算机软件与理论, 2004, 硕士
【摘要】 自然语言理解,特别是中文信息处理,一直是计算机界的难题。让计算机来理解和处理丰富的自然语言,极具挑战性。自然语言的计算机处理是一个多学科交叉研究领域。来自计算机科学、语言学、数学等不同学科的研究人员构成了目前这一领域的主要研究力量。自然语言处理就是研究如何能让计算机理解并生成人们日常所使用的(如汉语、英语)语言,使得计算机懂得自然语言的含义,并对人给计算机提出的问题,通过对话的方式,用自然语言进行回答。近十多年来,随着计算机本身以及信息高速公路的飞速发展,中文信息处理开始更加重视语义的研究以及大规模语义词典或大规模知识库的建设。知网的创始人董振东先生在从事多年这方面研究的基础上在中文信息处理中创出新路,提出了知网(HowNet)的概念。知网是一个以汉语和英语的词语所代表的概念为描述对象,以揭示概念与概念之间以及概念所具有的属性之间的关系为基本内容的常识知识库。它为语言信息处理的研发提供了丰富的知识资源。 本文根据董先生提出的知网理论和知网的数据文件,设计并建立了一个较为系统的知识库,并在上层利用知网知识库进行汉语言信息处理方面做了初步的研究。对于知识库的设计,即后台的语言知识表示系统,我们采用了程序表示和数据库结合的方法,即采用面向对象的程序设计方法对知识的概念进行表示,同时在数据库中记录词和程序的对应关系。这样在分析自然语言的时候,面对的不再是一些字符编码的组合,而是一个能描述词意义的对象集合。在知识库的基础之上我们还尝试地设计了知识库的API,为上层进行语言排歧和语义相似度计算等自然语言处理提供访问知识库的接口。最后,本文指出了在设计时的不足以及可能的改进方案。
【Abstract】 NLP(Natural Language Processing), especially Chinese Language Processing, is always a difficult problem in computer science. It is a serious challenge for us to make computer to understanding and processing the natural language. NLP is a cross-fields science. Many scientist which from computer science, linguistics, mathematics and many other fields join its researching. The NLP is how to make computer understanding and creating the words in people’s nature language, i.e Chinese, English, to make computer understanding the nature language and can communicate with people in nature language, i.e answer the people’s question. In recent 10 years, with rapidly developing of computer science and information highway, Natural Language Processing had paid more attention to study semantic and build a large-scale semantic dictionary or a large-scale knowledge-base. The Founder of Hownet , Mr. Dong Zhen-dong, who had taken many years in this field, exploit a new way in chinese information processing, introducing a novel concept: Hownet. Hownet is a bilingual general knowledge-base describing relation between concepts and relations between the attributes of concepts. It provides a enrichment resource for Language Processing.This dissertation mainly discusses how to design and build a knowledge-base, using Hownet theory and the data which Hownet provides. And it also discuss how the knowledge-base can help us for Chinese language processing. The knowledge-base is the nature language denotation system. We use the OOP programe plus database to design it. In detail, we use OOP to descibe the concept classifying of nature language and database to restore the relation of word and its program. So that, computer can parse the collection of some objects that can descibe the meaning of the word rather than the meaningless codes of nature language words. We also attempt to make a knowledge-base API(Application Programming Interface ), in order to service for Chinese semantic disambiguation and Chinese semantic similarity algorithm. At last, dissertation points out the draw back of our program and submits several possible solutions.
- 【网络出版投稿人】 电子科技大学 【网络出版年期】2005年 01期
- 【分类号】TP391.1
- 【被引频次】5
- 【下载频次】610