节点文献
融合先验信息的蒙汉神经网络机器翻译模型
Mongolian-Chinese Neural Machine Translation with Priori Information
【摘要】 神经网络机器翻译模型在蒙古文到汉文的翻译任务上取得了很好的效果。神经网络翻译模型仅利用双语语料获得词向量,而有限的双语语料规模却限制了词向量的表示。该文将先验信息融合到神经网络机器翻译中,首先将大规模单语语料训练得到的词向量作为翻译模型的初始词向量,同时在词向量中加入词性特征,从而缓解单词的语法歧义问题。其次,为了降低翻译模型解码器的计算复杂度以及模型的训练时间,通常会限制目标词典大小,这导致大量未登录词的出现。该文利用加入词性特征的词向量计算单词之间的相似度,将未登录词用目标词典中与之最相近的单词替换,以缓解未登录词问题。最终实验显示在蒙古文到汉文的翻译任务上将译文的BLEU值提高了2.68个BLEU点。
【Abstract】 Neural machine translation(NMT)has become a prominent model in Mongolian-Chinese translation task.We implement neural machine translation model with priori information.On one hand,we train word representations using large-scale monolingual corpus to act as the initial word vectors.On the other hand,we add part-of-speech feature for word vector to solve the problem of grammatical ambiguity.To solve the out of vocabulary problem,we use word embedding to calculate the similarity of words,then replace the out-of-vocabulary words by the most similar words who are covered by the target vocabulary.In the task of Mongolian-Chinese machine translation,experimental results show that BLEU increased 2.68 points.
【Key words】 recurrent neural network; out-of-vocabulary; word embedding; part-of-speech;
- 【文献出处】 中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2018年06期
- 【分类号】TP391.2
- 【被引频次】17
- 【下载频次】244