节点文献
基于主动学习的中文依存句法分析
Active Learning for Chinese Dependency Parsing
【Author】 Chen Xin,Che Wanxiang,Liu Ting Research Center for Information Retrieval,School of Computer Science and Technology, Harbin Institute of Technology,Harbin 150001
【机构】 哈尔滨工业大学计算机学院信息检索研究中心;
【摘要】 目前依存句法分析仍主要采用有指导的机器学习方法,即需要大规模高质量的树库作为训练语料,而现阶段中文依存树库资源相对较少,树库标注又是一件费时费力的工作。面对大量未标注语料,本文将主动学习应用到中文依存句法分析,优先选择句法模型预测不准的实例交由人工标注。本文提出并比较了多种衡量依存句法模型预测可信度的准则。实验表明,一方面,与随机选择标注实例相比,当使用相同数目训练实例时,主动学习使中文依存分析性能最高提升0.8%;另一方面,主动学习使依存分析达到相同准确率时只需标注更少量实例,人工标注量最多可减少30%。
【Abstract】 It is necessary to have a large annotated treebank to build a statistical dependency parser.Acquisition of such a treebank is time consuming,tedious and expensive.This paper presents a method to reduce this demand using active learning, which selects the most uncertain samples to annotate,instead of annotating blindly the whole training corpus.Experiments are carried out on the HTT-CIR-CDT,our results show that the parsing accuracy rises about 0.8 percent by active learning when using the same amount of training samples.In other words,for about the same parsing accuracy,we only need to annotate 70% of the samples as compared to the usual random selection method
【Key words】 active learning; dependency parsing; uncertainty-based sampling; query-by-committee;
- 【会议录名称】 中国计算语言学研究前沿进展(2009-2011)
- 【会议名称】第十一届全国计算语言学学术会议
- 【会议时间】2011-08-20
- 【会议地点】中国河南洛阳
- 【分类号】TP391.1;TP181
- 【主办单位】中国中文信息学会