节点文献

基于SVM的确定性中文依存关系解析

Deterministic Chinese Dependency Analysis Based on Support Vector Machine

【作者】 杨洋

【导师】 黄德根;

【作者基本信息】 大连理工大学 , 软件工程, 2006, 硕士

【摘要】 句法分析是中文自然语言处理的关键技术之一,句法分析的任务是自动分析出句子的语法结构及语法关系,将一个线性序列的句子转换成一个结构化的语法树。根据不同的语法体系,分析结果表现为不同的形式。本文的句法分析采用的是依存语法的语法体系。 中文依存关系是基于中文依存文法,确定句子中词之间的依存关系。词是句子结构中的最小元素,词与词之间的依存关系解析可以表示词间的深层联系,所以本文在词的基础上进行依存关系解析。大连理工大学自然语言处理实验室在前人研究基础上,依据依存公理制定了一套中文词间依存关系体系,共定义了三十八种词间的依存关系类型,为依存关系语料库的制作提供了标准。 本文基于支持向量机(SVM)采用确定性解析算法进行中文依存关系解析,并且依据中文语法的特点,提出一种改进的确定性中文依存关系解析方法。Nivre算法已经成功的应用于英文的依存关系解析,又英文和中文在句法特点上具有一定的相似性,所以本文采用确定性Nivre算法进行中文依存关系解析。确定性解析算法通过解析句子中各个词与其前后词的依存关系解析整个句子。在中文中,有些具有依存关系的词距离较远,使用确定性Nivre算法进行解析效果并不理想。依据中文语法的特点,在不增加解析时间的前提下提出考虑远距离依存关系的确定性Nivre算法,基于SVM识别中文依存关系。 实验数据采用用哈尔滨工业大学的依存关系语料库。结果表明,使用考虑远距离依存关系的确定性Nivre算法解析中文,使解析精度提高了5.32%,达到78.30%。封闭测试几乎完全正确地解析了训练语料,达到97.64%。考虑远距离依存关系的确定性Nivre算法比原有算法更能体现中文依存关系的特点,有利于依存关系解析。

【Abstract】 Syntax analysis is the crucial section of natural language processing and machine translation. The mission of syntax analysis automatically outputs sentence structure and sentence relations, which turns the linear sequence sentence to the structural syntax tree. According on the different grammar system, analysis results represent different format. The paper research dependency syntax.Chinese dependency relation resulting from the dependency analysis can represent the further syntax relations between the words in the sentence based on dependency constrains. Word is the smallest element of sentence, the dependency based on words analysis can represent deep syntax relation so that the paper research Chinese dependency relation between each word. Based on dependency axiom, defined the thirty-eight dependency types standard of Chinese words for corpus by nature language processing laboratory, Dalian university of technology.This paper proposes deterministic Chinese dependency analysis method which considering long-distance dependency. Because Nivre algorithm have been used for English dependency analysis, and the syntax structure also resemble between Chinese and English. So choose the Nivre algorithm for the deterministic algorithm. Deterministic Chinese dependency analysis is to parse a sentence only deciding whether the current word modifies words immediately. beside it. However in some Chinese sentences, the children of the focused word may be far away from it. It is difficult to parse this condition with conventional deterministic dependency parser. The proposed method parses a sentence deterministically with consideration of long-distance dependency. Support Vector Machines is applied to identify Chinese dependency.Experiments using the Harbin University of Technology Corpus show that the method outperforms previous system by 5.32% accuracy. The dependency accuracy achieves 78.30%. The close test dependency accuracy achieves 97.64%. The results prove that the method proposed in this paper fit for Chinese characteristic and achieve a better parsing accuracy.

  • 【分类号】TP391.1
  • 【被引频次】3
  • 【下载频次】261
节点文献中: 

本文链接的文献网络图示:

本文的引文网络