节点文献

大规模汉语语料库中任意n的n-gram统计算法及知识获取方法

Algorithm of n gram Statistics for Arbitrary n and Knowledge Acquisition Based on Statistics

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 Zhang Min,Li Sheng and Zhao Tiejun (Department of Computer Science and Engineering,Harbin Institute of Technology,Harbin 150001)

【机构】哈尔滨工业大学计算机科学与工程系；

【摘要】本文提出并实现了一种大规模汉语语料库中字、词级任意ｎ的ｎ－ｇｒａｍ统计算法，本算法可以一次性统计出所有不大于任意ｎ（本文ｎ取为２５６）的字、词级ｎ－ｇｒａｍ，可将传统ｎ－ｇｒａｍ统计时的指数空间开销变为线性的，且与所统计的元数无关。基于这种ｎ－ｇｒａｍ的统计，本文还进行了汉语信息熵的计算及字、词级知识获取的研究。本算法及本文的研究结果已应用于我们研制的机译系统中更多还原

【Abstract】 A new algorithm of n gram statistics for arbitrary n at word or phrase level is proposed and realized in this paper,with which the n gram for all n at word or phrase level can be calculated at the same time. Based on the n gram,the Chinese information entropy and knowledge acquisition at word or phrase level have also been studied.The algorithm and its result have been integrated with a MT system.更多还原

【关键词】 n元语法；统计；信息熵；知识获取；
【Key words】 n gram； statistics； information entropy； knowledge acquisition；

【文献出处】情报学报 ,JOURNAL OF THE CHINA SOCIETY FOR SCIENTIFIC AND TECHNICAL INFORMATION , 编辑部邮箱 ,1997年01期

【分类号】TP391.1
【被引频次】17
【下载频次】346

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献