节点文献

基于统计语言模型的双向词类标注方法

A Two-Directions Method of Chinese Corpus Tagging Based on Statistical Language Model

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 LIU Qi-He ZHAN Si-Yu YANG Guo-Wei (Computer School,UEST,Chengdu 610054)

【机构】电子科技大学计算机学院；电子科技大学计算机学院成都 610054；成都 610054；

【摘要】 <正> 1 引言在自然语言处理中,词类标注是一项重要的工作,它为句法分析、机器翻译、自然语言理解等提供语法知识。在进行自然语言的词类标注时,由于词的多词类现象,有许多词在不同的上下文中有不同的词类,汉语词类标注过程其实就是一个词类排歧过程。当前的词类标注有两种方法:基于规则和基于统计的方法。基于规则的方法是利用系统的知识库进行词类标注,但知识库中知识的不足约束了该方法的使用效率。基于统计的方法则是利用语料库计算概率来标注词类,其正更多还原

【Abstract】 In the paper, we introduce chinese corpus tagging based on statistical language model (bi.gram model)and Huang-Yu’s smoothing method. Especially, we also suggest a two.directions method based on statistical language model, namely, we not only compute probability of P(C | W)(W = W1W2W3..., wm),but also compute probability of P(C|wnwn-1......w1). From our experience we can see it can enhance the accuracy of Chinese corpus tagging using thismethod of two directions computation.更多还原

【关键词】 Natural language processing； Statistical language model； Smoothing method； Chinese corpus tagging；
【Key words】 Natural language processing； Statistical language model； Smoothing method； Chinese corpus tagging；

【文献出处】计算机科学 ,Computer Science , 编辑部邮箱 ,2003年09期

【分类号】TP391.1
【被引频次】7
【下载频次】199

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献