节点文献

基于主动学习的中文依存句法分析

Active Learning for Chinese Dependency Parsing

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 Chen Xin,Che Wanxiang,Liu Ting Research Center for Information Retrieval,School of Computer Science and Technology, Harbin Institute of Technology,Harbin 150001

【机构】哈尔滨工业大学计算机学院信息检索研究中心；

【摘要】目前依存句法分析仍主要采用有指导的机器学习方法,即需要大规模高质量的树库作为训练语料,而现阶段中文依存树库资源相对较少,树库标注又是一件费时费力的工作。面对大量未标注语料,本文将主动学习应用到中文依存句法分析,优先选择句法模型预测不准的实例交由人工标注。本文提出并比较了多种衡量依存句法模型预测可信度的准则。实验表明,一方面,与随机选择标注实例相比,当使用相同数目训练实例时,主动学习使中文依存分析性能最高提升0.8%;另一方面,主动学习使依存分析达到相同准确率时只需标注更少量实例,人工标注量最多可减少30%。更多还原

【Abstract】 It is necessary to have a large annotated treebank to build a statistical dependency parser.Acquisition of such a treebank is time consuming,tedious and expensive.This paper presents a method to reduce this demand using active learning, which selects the most uncertain samples to annotate,instead of annotating blindly the whole training corpus.Experiments are carried out on the HTT-CIR-CDT,our results show that the parsing accuracy rises about 0.8 percent by active learning when using the same amount of training samples.In other words,for about the same parsing accuracy,we only need to annotate 70% of the samples as compared to the usual random selection method更多还原

【关键词】主动学习；依存句法；不确定性度量；委员会投票；
【Key words】 active learning； dependency parsing； uncertainty-based sampling； query-by-committee；

【基金】国家自然科学基金(60803093;60975055);哈尔滨工业大学科研创新基金(HITNSRIF.2009069);中央高效基本科研业务费专项资金(HIT.KLOF.2010064)的资助

【会议录名称】中国计算语言学研究前沿进展（2009-2011）

【会议名称】第十一届全国计算语言学学术会议

【会议时间】2011-08-20
【会议地点】中国河南洛阳
【分类号】TP391.1;TP181

【主办单位】中国中文信息学会

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献