节点文献

基于PCFG-HDSM模型的语义句式识别

Semantic Structure Identification Based on PCFG-HDSM Model

【作者】 徐斌

【导师】 顾宏斌;

【作者基本信息】 南京航空航天大学 , 载运工具运用工程, 2008, 硕士

【摘要】 语义理解在自然语言处理中处于重要地位,同时语义句式在汉语言语义的表达上不可忽视。为了让计算机能够自动处理机务信息,并且使得处理的范围涉及到语义的层次,本文强调了语义句式在语义理解方面的基础作用。为了能让计算机理解文本语义信息,首先要让计算机智能地识别出语义句式,然后再针对语义句式进行处理,因此本文围绕语义句式的自动识别这个课题开展工作。为了实现语义句式的自动识别,本文选择了先对文本进行句法分析,再将句法模式转化成语义模式的方法。在句法分析方面,本文人工建立了句法标注语料库,其过程是:首先使用中国科学院计算机研究所开发的词法分析系统对初始语料文本进行分词及词性标注,再对词性标注进行了适当修正,以减少歧义,之后再人工地对其进行句法标注。通过这个标注语料库学习了385条句法规则,用于句法分析系统。为了提高句法分析器的分歧能力和分析准确率,本文结合上下文无关概率模型PCFG和头驱动概率模型HDSM各自的优点,提出了一种新型的概率模型PCFG-HDSM,并基于GLR算法,实现了一个新型的汉语句法分析器。经过开放测试,准确率和回归率分别达到80.8%,74.3%,与中科院的Prop分析器分析结果比较有了一些提高,证明新模型PCFG-HDSM确实提高了分析器的分歧能力。在语义句式分析方面,本文在总结句法和语义元素的对应关系的前提下,分析句法分析结果,获得语义模式,经过大量语句测试,本文最后统计汇总了各个语义模式的分布情况。

【Abstract】 Semantic understanding plays an important role in Natural Language Processing, and semantic structure can not be ignored in Chinese semantic expression. In order to process aircraft maintenance information automatically, and enhance the range of processing to semantic layer, this paper emphasizes the basic role of semantic structure in semantic understanding. In order to understand the semantic information of text, the system identifies the semantic structure intelligently at first, and performs semantic processing secondly. Therefore this paper focuses on identifying the semantic structure automatically.For the purpose of identifying the semantic structure automatically, this paper parses the text firstly, and then converts the syntactic model to the semantic model. In part of syntactic analysis, this paper constructs the syntactic tree library at first, and the processing is: we firstly use the ICTCLAS(Institute of Computing Technology, Chinese Lexical Analysis System)to perform lexical analysis for the raw text corpus, then ameliorate the result of words parsing to decrease the disambiguity, at last identify syntax information artificially. The system learned 385 rules from this signed corpus, and the rules are used in syntactic analysis processing. To improve the capacity of processing disambiguity and the precision of parser, this paper proposes a syntactic parsing model PCFG-HDSM based on GLR algorithm, the model combines the strongpoint of PCFG(Probabilistic Context-Free Grammar) and which of HDSM(Head-Driven Statistical Models), and we also realize a new syntactic parser for chinese based on the new model. In the opened test, we get the result that label precision and label recall are 80.8% and 74.3% respectively. Compared with the result of Prop program from the Chinese Academy of Sciences, it improves a little. It proves that the new model PCFG-HDSM can improve the capacity of parser’s processing disambiguity. In part of semantic structure parsing we process the result of syntactic analysis to get the semantic model. We test a large amount of text and do some statistical work to figure out the distribution information of every semantic model.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络