节点文献

高考地理问答题问句中的事件抽取研究

Study on the Event Extraction of Questions in the Geography Question and Answer of College Entrance Examination

【作者】 陈宇

【导师】 张志政;

【作者基本信息】 东南大学 , 软件工程, 2017, 硕士

【摘要】 事件抽取是信息抽取的重要研究内容。国家863项目——开放域知识关联、推理和索引关键技术及系统中,以实现一个智能地理答题系统为目标,题面信息抽取是其中重要的研究内容之一,通过人工分析发现,地理问答题的特点是:大部分题目中,最后的问句提供了足够的解题所需信息;问句的形式具有一定的模式,并且问句包含的信息依赖地理领域知识。在此背景下,本文研究从地理问答题的问句中抽取事件,并转换为解题模板的形式。本文研究内容包括:(1)分析了多年的高考地理问答题,结合领域专家总结和聚类分析方法,定义高考地理题问句中包含的事件类型以及事件元素。(2)使用支持向量机、事件触发词表与规则结合的方法,对考题中的问题事件和地理事件进行事件类型识别。(3)采用最大熵模型对考题中的地理事件进行事件元素识别,采用基于依存句法分析和事件元素词表的方法对问题事件进行事件元素识别。本文主要研究成果有:(1)将高考地理题问句中的事件分为问题事件和地理事件,根据题型和解题模板定义了问题事件类型,通过聚类分析方法结合地理课本章节划分方式,定义了地理事件类型。根据解题需要和事件本身的特点定义了问题事件元素和地理事件元素。本文着重给出了区位因素分析类问题事件、农业事件以及交通运输地理事件三类具有代表性的事件的分层建模方法,为其他地理题问句中的事件建模提供借鉴。(2)提出结合支持向量机、事件触发词表和规则的事件类型抽取方法。因为地理高考试题问句存在小类别事件数据匮乏数据稀疏的问题,本文提出对地理事件句先使用分类器识别其地理事件领域类型,再基于触发词表识别其地理事件的原子类型,最后使用规则识别事件父类型层次结构。实验表明,通过该方法可以改善事件抽取准确率,弥补了小类别地理事件数据量少数据稀疏的问题。(3)使用最大熵模型对考题中的地理事件进行事件元素识别,以便于对不同类别地理事件元素抽取的方法移植。出于句式形式单一的原因,使用基于依存句法分析和事件元素词表的方法对问题事件进行事件元素抽取,算法可解释性强,实现较容易,且不需要大量标注语料。(4)设计了事件抽取综合实验。实现了:输入一道地理题的问句部分,输出所有的事件类型和对应的事件元素。通过综合实验,一方面检测事件抽取系统的整体效果,另一方面事件抽取综合系统的输出可以用于转换为高考地理问答系统解题端的输入,即解题模板。

【Abstract】 Event extraction is an important research content of information extraction.The open domain knowledge association and key technology index system and reasoning and the National 863 project,in order to achieve an intelligent answering system of geography as the goal,the question of information extraction is an important research content,through artificial analysis found that the characteristics of geography quiz questions are:most of the questions provided to understand all the questions last the information required;the form of questions has certain modes;and the question contains information on geography knowledge.In this context,this thesis studies the extraction of events from the questions of geographical questions and converts them into the form of problem-solving templates.The contents of this thesis include:(1)analysis of the geography for many years questions,combined with experts in the field are summarized and the cluster analysis method,including the definition of geography questions the type of event and event elements.(2)using a combination of event trigger method of support vector machine for event type recognition of the test questions in the event.(3)geographic events using maximum entropy model for event element recognition of geography exam in the event,using the method of syntactic dependency relations and vocabulary based on problem of event element event event element recognition.The main achievements include:(1)The geography questions in the event is divided into events and geographic events,according to the questions and solving problems of template event types are defined,combined with the geography textbook chapters through the cluster analysis method,the definition of geographical event types.The problem,event element and geographic event element are defined according to the problem solving needs and the characteristics of the event itself.This thesis gives the analysis of location factors such problems of agricultural events,events and the layered modeling method of transport geography three representative events,provide a reference for other geographical questions in event modeling.(2)A method of event type extraction based on support vector machines and event triggered words is proposed.Because of the geography college entrance examination questions the existence of small categories of event data lack of data sparse problem,this thesis first use the classifier to recognize the geographic events of its geographical event parent types,based on trigger words recognition of its geographical incident atom types.Experiments show that the method can improve the accuracy of event extraction,and make up for the small amount of data and sparse data of small class of geographic events.(3)Using the maximum entropy model,the event elements of the geographical events in the examination questions are identified,so that the extraction of the elements of different types of geographical events is easy to transplant.For reasons of single sentence form,using the method of syntactic dependency relations and vocabulary based on problem of event element event event element extraction algorithm,strong interpretability,easy realization does not require a large amount of corpus.(4)A comprehensive experiment of event extraction is designed.Implementation:enter a question question section of the geographic question,output all event types and corresponding event elements.Through comprehensive experiments,on the one hand,the overall effect of event extraction system is detected.On the other hand,the output of event extraction integrated system can be used to transform the input of the question answering system of the college entrance examination question answering system.

  • 【网络出版投稿人】 东南大学
  • 【网络出版年期】2018年 12期
  • 【分类号】TP391.1
  • 【被引频次】1
  • 【下载频次】140
节点文献中: