节点文献

中文问答系统中问题分类及答案候选句抽取的研究

Research on Question Classification and Candidate Answer Sentences Extraction in Chinese Question Answering System

【作者】 文勖

【导师】 张宇;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2006, 硕士

【摘要】 自动问答系统是集自然语言处理技术和信息检索技术于一身的新一代智能搜索引擎。问答系统包括问题分类、查询扩展、文本检索、答案抽取和答案选择排序,其中,问题分类和答案抽取最为关键。本论文主要利用自然语言处理技术,研究中文问答系统的两个关键技术:问题分类和候选答案句抽取。在问答系统中,问题分类作为其第一个重要模块主要有两个作用,第一,问题分类能有效地减少候选答案的空间,提高系统返回答案的准确率。第二,问题分类提供的答案类型信息决定了答案抽取策略。本文针对文本分类和问题分类的差别,利用依存分析提取主干和疑问词及其附属成分,并结合主干关联词对,采用支持向量机分类器,此方法大大减少了问题分类的噪音,突出了问题分类的主要特征,并考虑了词与词之间的句法关系,取得了良好效果;同时,针对普通层次分类在问题分类上效果不理想的情况,本文提出了类别主特征结合句法特征的中文问题层次分类新思想,利用句法分析提取分类特征,在问题分类中融入了句法信息,总的准确率达到大类88.25%和小类73.15%,比传统的层次分类分别提高了10个百分点,证明了此方法的有效性。候选答案句抽取是问答系统中答案抽取的重要组成部分,其质量直接影响问答系统的性能。针对文本检索和句子检索之间的区别,本文主要采用指代消解预处理,改进的编辑距离与向量空间模型相结合的方法,对factoid问题的答案句检索效果显著,准确率为84.71%。答案句确认主要通过对问句和候选答案句的树形结构匹配,把句子的句法信息融入到候选句的抽取中,在一定程度上克服了简单词袋模型的缺点,并提出简化而有效的树形结构匹配算法——先根遍历后改进编辑距离的新方法,准确率和召回率分别提高了6.2和7.7个百分点。

【Abstract】 Question Answering (QA) is the next generation of search engine which is related to natural language processing, information retrieval and etc. A Question Answering contains question classification, query expansion, text retrieval, answer extraction and answer selection. Question classification and answer extraction are the most important. Natural language processing is used to research the two key techniques which are question classification and answer sentence extraction in this paper.Question classification as the most important model has two functions. First, it can efficiently reduce the space of candidate answers to improve system’s performance, and the second is that the question type can decide an optimum strategy of answer extraction. Because of the difference between text classification and question classification, a new method using support vector machine and the related words of Subject-Predicate structure is proposed in this paper. This method substantially reduces the noise, and stresses the main features of question classification to improve performance. At the same time, because general hierarchical is not good on question classification, this paper proposes a new method for Chinese question hierarchical classification. This method combines the key class features with the question syntactic features to classify questions. Since this method extracts the syntax features and adds syntax information into question classification, at last, the precision of the coarse classes reaches 88.25% and fine classes reaches 73.15%, respectively improves nearly ten percent than the traditional hierarchy classification, proving this method is effective.Candidate answer sentence extraction is an important part of answer extraction, which directly effects question answering system’s performance. Meeting the difference of text retrieval and sentence retrieval, a new method using integrating anaphora resolution, improved edit distance and vector space model is proposed in this paper. On factoid question type, the precision of answer sentence retrieval is up to 84.71%. Answer sentence decision overcomes bag of words model by using mapping dependencies trees between query and candidate

  • 【分类号】TP18;TP391.1
  • 【被引频次】22
  • 【下载频次】1651
节点文献中: 

本文链接的文献网络图示:

本文的引文网络