节点文献

基于篇章的汉语句法结构树库

A Discourse-based Chinese Chunkbank

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 卢露矫红岩李梦荀恩东

【Author】 LU Lu;JIAO Hong-Yan;LI Meng;XUN En-Dong;College of Information Science, Beijing Language and Culture University;

【通讯作者】 荀恩东;

【机构】 北京语言大学信息科学学院

【摘要】 为快速构建一个大规模、多领域的高质树库,提出一种基于短语功能与句法角色组块的、便于标注多层次结构的标注体系,在篇章中综合利用标点、句法结构、表述功能作为句边界判断标准,确立合理的句边界与层次;在句子中以组块的句法功能为主,参考篇章功能、人际功能,以4个性质标记、8个功能标记、4个句标记来描写句中3类5种组块,标注基本句型骨架,突出中心词信息.目前已初步构建有质量保证的千万汉字规模的浅层结构分析树,包含60余万小句的9千余条句型结构库,语料涉及百科、新闻、专利等应用领域文本1万余篇;同时,也探索了高效的标注众包管理模式.

【Abstract】 In order to provide a large scale annotation of Chinese functional chunk for linguistic research and syntactic parsing, we present a method to quickly build a discourse based Chinese chunkbank with high quality in multi-domain: Firstly, we use punctuations, syntax, expression functions of VP and NP, to segment complex sentences into several independent simple sentences; Secondly, based on the syntactic function, textual function, discourse function and interpersonal function of the chunks, we design 4 phrase tags, 8 functional tags, 4 sentence boundary tags to depict the chunks, which was classified into 3 types and 5 kinds. the annotators annotated the skeleton structure and highlighted the head word of the predicate for every simple sentence. Until now, we have been annotating more than 10 million of Chinese characters, including 9 thousand of skeleton structures for 60 thousand sentences. The chunkbank covers a range of text genres, including baidubaike, internet news, patent, etc. At the same time, we explored an effective model of crowdsourced data management.

【关键词】 语料库标注树库语块句法分析
【Key words】 Corpus annotationtreebankchunksyntactic parsing
【基金】 国家社会科学基金(16AYY007);北京语言大学研究生创新基金(19YCX121)资助~~
  • 【文献出处】 自动化学报 ,Acta Automatica Sinica , 编辑部邮箱 ,2022年12期
  • 【分类号】TP391.1
  • 【下载频次】360
节点文献中: 

本文链接的文献网络图示:

本文的引文网络