节点文献
基于逆向最大匹配分词算法的汉盲翻译系统
CHINESE-BRAILLE TRANSLATION SYSTEM BASED ON REVERSE MAXIMUM MATCHING WORD SEGMENTATION
【摘要】 汉盲翻译是把汉字源文本自动翻译为对应的盲文文本,目前存在着多音字混淆、未登录词不能增加、不符合盲文分词连写规则等挑战。构建一个基于逆向最大匹配分词算法的汉盲翻译系统,能够较好识别多音字,自主添加未登录词,得到较正确的分词连写结果,有效提高了汉盲翻译的准确率。该系统基于词库可以区分出大多数多音字,能够得到较符合盲文分词连写规则的分词结果,并可向词库自主添加未登录词,进而提高中文分词的准确率。实验结果表明该系统能够降低因中文分词错误引起的语句歧义,减少因多音字混淆引起的翻译错误,避免因音节结构分散导致的盲文方数繁多,具有一定的开放性和实用性。
【Abstract】 The Chinese-Braille translation is to translate Chinese text into corresponding Braille text automatically. At present, there are many challenges, such as the confusion of polysyllabic words, the inability to add out-of-vocabulary, the inconsistency with Braille segmentation and linking rules. We build a Chinese-Braille translation system based on the reverse maximum matching word segmentation algorithm(CTB-RMM), which can distinguish polysyllabic characters, add out-of-vocabulary independently, get correct results of word segmentation, and effectively improve the accuracy of Chinese-Braille translation. Based on the dictionary library, the system distinguished most of the polysyllabic characters and got the segmentation results which are more consistent with the Braille segmentation rules, and it could add out-of-vocabulary to the dictionary independently, so as to improve the accuracy of Chinese word segmentation. The experimental results show that the system can reduce the sentence ambiguity caused by Chinese word segmentation errors, reduce the translation errors caused by polysyllabic confusion, and avoid the large number of Braille caused by the scattered syllable structure. The system has certain openness and practicability.
【Key words】 Chinese-Braille translation; Chinese word segmentation; Out-of-vocabulary; Reverse maximum matching(RMM);
- 【文献出处】 计算机应用与软件 ,Computer Applications and Software , 编辑部邮箱 ,2021年10期
- 【分类号】TP391.2
- 【被引频次】6
- 【下载频次】239