节点文献
DRTE:面向基础教育的术语抽取方法
DRTE:A Term Extraction Method for K12 Education
【摘要】 术语抽取从非结构化文本中自动抽取专业术语。该工作在中文分词、信息抽取、知识库构建中发挥着重要的作用。当前术语抽取方法很大程度上依赖于词的统计信息,由于基础教育学科中术语具有极强的长尾特性,导致基于统计的术语抽取方法很难抽取出处于尾端的术语。该文结合基础教育的学科特点,提出了DRTE:一种利用术语定义与术语关系挖掘,综合构词规则与边界检测的术语抽取方法。该文以初高中的数学课本为数据源进行术语抽取,实验结果表明我们的术语抽取方法 F1值达到82.7%,相比目前的方法提高了40.8%,能够有效地在中文基础教育领域进行自动化的术语抽取。
【Abstract】 Term extraction is an essential task where terms are extracted automatically from unstructured text based on a specific domain.Previous methods largely rely on terms’statistic information.However,terms in k12 education area have serious long-tail effect,which makes it hard to extract terms at the tail part in methods based on statistics.In this paper,we propose DRTE,a method which focus on extracting terms from their definitions and relations.Our method also utilizes term-formation rules and boundary detection strategies.Experiments on math textbooks for middle school and high school reveal 82.7% on F1 performance of our method,which significantly outperforms the current method by 40.8%.
- 【文献出处】 中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2018年03期
- 【分类号】TP391.1
- 【被引频次】11
- 【下载频次】381