节点文献

融合先验信息的蒙汉神经网络机器翻译模型

Mongolian-Chinese Neural Machine Translation with Priori Information

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 FAN Wenting;HOU Hongxu;WANG Hongbin;WU Jing;LI Jinting;College of Computer Science,Inner Mongolia University;

【摘要】神经网络机器翻译模型在蒙古文到汉文的翻译任务上取得了很好的效果。神经网络翻译模型仅利用双语语料获得词向量,而有限的双语语料规模却限制了词向量的表示。该文将先验信息融合到神经网络机器翻译中,首先将大规模单语语料训练得到的词向量作为翻译模型的初始词向量,同时在词向量中加入词性特征,从而缓解单词的语法歧义问题。其次,为了降低翻译模型解码器的计算复杂度以及模型的训练时间,通常会限制目标词典大小,这导致大量未登录词的出现。该文利用加入词性特征的词向量计算单词之间的相似度,将未登录词用目标词典中与之最相近的单词替换,以缓解未登录词问题。最终实验显示在蒙古文到汉文的翻译任务上将译文的BLEU值提高了2.68个BLEU点。更多还原

【Abstract】 Neural machine translation(NMT)has become a prominent model in Mongolian-Chinese translation task.We implement neural machine translation model with priori information.On one hand,we train word representations using large-scale monolingual corpus to act as the initial word vectors.On the other hand,we add part-of-speech feature for word vector to solve the problem of grammatical ambiguity.To solve the out of vocabulary problem,we use word embedding to calculate the similarity of words,then replace the out-of-vocabulary words by the most similar words who are covered by the target vocabulary.In the task of Mongolian-Chinese machine translation,experimental results show that BLEU increased 2.68 points.更多还原

【关键词】重现神经网络；未登录词；词向量；词性标注；
【Key words】 recurrent neural network； out-of-vocabulary； word embedding； part-of-speech；

【基金】国家自然科学基金(61362028)

【文献出处】中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2018年06期

【分类号】TP391.2
【被引频次】17
【下载频次】244

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献