节点文献
基于音节-形态素融合的朝鲜语命名实体识别研究
Korean Named Entity Recognition Based on Syllable-Morpheme Fusion
【摘要】 命名实体识别任务是朝鲜语自然语言处理研究过程中最重要的基础任务之一。针对朝鲜语命名实体识别的边界划定不明确和准确率低等问题,该文提出基于Transformer的音节-形态素融合的朝鲜语命名实体识别模型。首先通过BERT预训练模型分别对音节和形态素进行词嵌入;其次使用两种不同的向量融合方法将音节向量和形态素向量相融合,即简单的向量拼接方法和考虑到向量联系与差异的启发式融合方法;最后将融合后的向量作为模型的输入完成命名实体识别任务。实验结果在KLUE公布的朝鲜语命名实体识别数据集中F1值达到了88.78%,相比单一粒度实验提高约3至4个百分点。
【Abstract】 The named entity recognition(NER) task is one of the most fundamental tasks in Korean natural language processing. In order to deal with the problems of unclear boundary delimitation and low accuracy rate of Korean NER, this paper proposes a syllable-morpheme fusion Korean named entity recognition model based on Transformer. Firstly, the word embedding is acquired for syllables and morphemes by BERT. Then, the simple vector concatenating method and a heuristic fusion method that takes into consideration the connection and difference between the two vectors are both described. Finally, the fused vectors are input into the model to complete NER task. Experimental results show that the F1-score in the Korean NER dataset published by KLUE reaches 88.78%, which is about 3~4% higher than the single granularity experiment.
【Key words】 Korean; named entity recognition; syllable-morpheme fusion; pre-training;
- 【文献出处】 中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2023年04期
- 【分类号】H55;TP391.1
- 【下载频次】20