节点文献

一种基于八词位标签的BiLSTM＿CRF藏文分词方法

An Eight-word-position Tag for Tibetan Word Segmentation via BiLSTM＿CRF

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 CHANG Fangyu;CAI Zhijie;College of Computer Science and Technology, Qinghai Normal University;The State Key Laboratory of Tibetan Intelligent Information Processing and Application;

【通讯作者】才智杰;

【机构】青海师范大学计算机学院；省部共建藏语智能信息处理及应用国家重点实验室；

【摘要】藏文分词是藏语自然语言处理的一项基础性任务，其性能影响藏文自动摘要、自动分类以及搜索引擎等多个方面。基于词位标注的藏文分词方法通常使用四词位标签集，为了更全面地提取特征信息和更深层次的语义信息，该文提出了一种八词位标签集，采用BiLSTM＿CRF模型得到一种基于八词位标签的BiLSTM＿CRF藏文分词方法。实验结果表明，该方法取得较好的分词效果，在测试数据集上的准确率、召回率和F₁值分别达95.07%、95.57%和95.32%。更多还原

【Abstract】 Tibetan word segmentation is a fundamental task of Tibetan natural language processing affecting such tasks as Tibetan automatic summary, automatic classification, and search engines. Tibetan word segmentation at present uses the four-word-position tagging method. This paper proposes an eight-word-position tag approach to extract feature and deeper semantic information more comprehensively. The whole segmentation system adopts the BiLSTM＿CRF framework. The experimental results demonstrate that the proposed method achieves 95.07% Tibetan word semination accuracy, 95.57% recall and 95.32% F-measure, respectively.更多还原

【关键词】自然语言处理；藏文分词； BiLSTM＿CRF；八词位标签；
【Key words】 NLP； Tibetan word segmentation； BiLSTM＿CRF； eight-word-position based tag；

【基金】国家自然科学基金(61966031,61866032);青海省科技厅资助项目(2019-SF-129,2021-ZJ-727);青海省藏文信息处理与机器翻译重点实验室(2020-ZJ-Y05);藏文信息处理教育部重点实验室(2013-Z-Y17,2014-Z-Y32,2015-Z-Y03)

【文献出处】中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2024年10期

【分类号】TP391.1
【下载频次】33

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献