节点文献
基于迭代知识精炼的对偶学习蒙汉机器翻译
Dual-learning Mongolian-Chinese machine translation based on iterative knowledge refining
【摘要】 深度学习方法凭借对语义的深度理解能力在机器翻译领域取得长足的进步.然而,对于低资源语言,大规模双语语料的缺乏易导致模型过拟合.针对低资源神经机器翻译数据稀疏的问题,提出了一种迭代知识精炼的对偶学习训练方法,利用回译扩充双语平行语料,通过迭代调整伪语料和真实语料比例,在学习语言表征的同时降低噪声风险,最后结合译文质量及流利度奖励,在源语-目标语和目标语-源语两个方向上优化模型参数,从而达到提升译文质量的目的.在第15届全国机器翻译大会(CCMT 2019)蒙古语-汉语翻译任务上进行了多项实验,结果表明本文方法相比基线提高显著,充分证明该方法的有效性.
【Abstract】 Deep learning has made great progress in the field of machine translation with its deep understanding of semantics.However,for low-resource languages,the lack of large-scale bilingual corpus leads to overfitting of models.Aiming at the problem of sparse data in low-resource neural machine translation,we propose a dual-learning training method based on iterative knowledge refining.Using back translation to expand bilingual parallel corpus,we have adjusted the proportion of pseudo corpus and real corpus iteratively to reduce the noise risk while learning language representation.Finally,combining the translation quality and fluency rewards to optimize model parameters in two directions,source-target and target-source,as well as to achieve the purpose of improving translation quality.We have conducted a number of experiments on the 15 th China Conference on Machine Translation(CWMT 2019)Mongolian-Chinese translation task.Results show that the proposed method has secured a significant improvement compared with the baselines,fully proving the effectiveness of the method.
【Key words】 neural machine translation; low-resource language; dual-learning; back translation; knowledge refining;
- 【文献出处】 厦门大学学报(自然科学版) ,Journal of Xiamen University(Natural Science) , 编辑部邮箱 ,2021年04期
- 【分类号】TP391.2
- 【被引频次】2
- 【下载频次】114