节点文献

基于BERT模型的司法文书实体识别方法

Entity Recognition Method for Judicial Documents Based on BERT Model

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 陈剑何涛闻英友马林涛

【Author】 CHEN Jian;HE Tao;WEN Ying-you;MA Lin-tao;School of Computer Science & Engineering/Neusoft Research Institute,Northeastern University;

【机构】 东北大学计算机科学与工程学院/东软研究院

【摘要】 采用手工分析案件卷宗,容易产生案件实体遗漏现象及提取特征效率低下问题.为此,使用基于双向训练Transformer的编码器表征预训练模型.在手工标注的语料库中微调模型参数,再由长短时记忆网络与条件随机场对前一层输出的语义编码进行解码,完成实体抽取.该预训练模型具有巨大的参数量、强大的特征提取能力和实体的多维语义表征等优势,可有效提升实体抽取效果.实验结果表明,本文提出的模型能实现89%以上的实体提取准确度,显著优于传统的循环神经网络和卷积神经网络模型.

【Abstract】 Using manual analysis of case files,it is easy to cause the problem of case entity omission and lowefficiency of feature extraction.Therefore, the bidirectional encoder representation from transformers pre-training model based on the traditional long short-term memory networks and conditional random fields was used to fine tune the model parameters on the manually labeled corpus for entity recognition.And then the semantic coding output from the previous layer was decoded by the long short-term memory networks and conditional random fields to complete entity extraction.The pre-training model has the advantages of huge parameters,powerful feature extraction ability and multi-dimensional semantic representation of entities,which can effectively improve the effect of entity extraction.The experimental results showed that the proposed model can achieve more than 89% entity extraction accuracy,which is significantly better than the traditional recurrent neural network and convolutional neural network model.

【基金】 国家重点研发计划项目(2018YFC0830601);辽宁省重点研发计划项目(2019JH2/10100027);中央高校基本科研业务费专项资金资助项目(N171802001);辽宁省“兴辽英才计划”项目(XLYC1802100)
  • 【文献出处】 东北大学学报(自然科学版) ,Journal of Northeastern University(Natural Science) , 编辑部邮箱 ,2020年10期
  • 【分类号】TP391.1
  • 【被引频次】22
  • 【下载频次】553
节点文献中: 

本文链接的文献网络图示:

本文的引文网络