节点文献

汉语组块识别的研究

A Study on Chinese Chunk Parsing

【作者】 王莹莹

【导师】 黄德根;

【作者基本信息】 大连理工大学 , 计算机应用技术, 2006, 硕士

【摘要】 随着自然语言处理中词法分析的日趋成熟,句法分析已经成为当前研究的重点和难点,组块识别的提出是为了降低完全句法分析的复杂性。通过采用“分而治之”的策略将句法分析分为组块识别和组块间关系分析,这样将在词语级的处理转换为组块级的处理,降低了句法分析的难度。本文的目的就是在词法分析的基础上,完成汉语句子的组块识别,为完全句法分析和其他自然语言处理任务提供基础。 本文首先阐述了组块识别的研究现状及研究意义,并参考前人的研究成果给出了组块的详细定义,研究并实现了基于增益隐马尔可夫模型和支持向量机(Support Vector Machine,SVM)的组块识别系统,应用错误驱动学习机制校正组块识别结果。 通过将不同的上下文信息导入隐马尔可夫模型(Hidden Markov Model,HMM)中,构建了5个二元增益HMM模型用于汉语句子的组块识别。对于SVM模型,选择组块的多种不同特征信息组合和不同的多分类划分方法,训练学习后得到了基于统计的SVM模型。为进一步提高组块识别的结果,采用错误驱动学习机制分别对增益HMM模型和SVM模型的识别结果进行校正。论文给出了两种模型的算法,并给出了两种不同模型的组块识别结果,及引入错误驱动学习机制校正后的结果。 实验表明,两种不同的组块识别模型都取得了较好的结果,其中增益HMM组块识别结果的F值为84.99%,SVM组块识别结果的F值为89.75%,从实验上验证了两种模型的有效性。在引入错误驱动学习方法后,两种模型组块识别结果的F值分别提高了1.05%和0.66%。 本文的研究成果可应用于实际翻译系统中,达到了简化了句子结构、提高机器翻译系统整体性能的目的。另外还可进一步应用到信息检索、文本分类等自然语言处理领域中。

【Abstract】 Syntactic parsing is an important and difficult task in the natural language processing (NLP). Because of the difficulties of complete syntactic parsing, chunk parsing has become an interesting alternative to full parsing. Using the divide-and-conquer strategy, syntactic parsing is divided into two sub-tasks, chunk parsing and the relationship analysis. The main goal of this paper is to implement Chinese chunk parsing task based on Morpho-Analysis, and provide the basis for complete syntactic parsing and other NLP tasks.In this paper, we first introduce the current research state of the chunk parsing and its significance. Based on the definition of chunk and the work of other researchers, we give the definition of Chinese chunks. Two systems for chunk parsing are built based on the Specialized Hidden Markov Model and Support Vector Machine Model.According to the different contextual information, we build five Specialized HMMs for Chinese chunk parsing. Via the analysis of the characteristic information from the chunks which have been tagged, we choose the different combination of characteristic information and classification means to realize the SVM models. Moreover, an error-driven learning approach is adopted to improve the chunk parsing results of Specialized HMM and SVM model.The models used in this paper are effective, and the experimental results show that the accuracies and recalls of chunking are satisfactory. The F-values of Specialized HMM and SVM chunking results are respectively 84.99% and 89.75%. With the help of error-driven learning, the performances of Specialized HMM-based chunking and SVM-based chunking are improved by 1.05% and 0.66%.The chunk parsing approaches introduced in this paper could be used in actual MT system, which can simplify sentences’ structure and improve the holistic performance. In addition, the research of this paper would also be applied to other NLP tasks, such as information retrieval, text classification and so on.

  • 【分类号】TP391.43
  • 【被引频次】17
  • 【下载频次】346
节点文献中: 

本文链接的文献网络图示:

本文的引文网络