节点文献

利用语义词典Web挖掘语言模型的无指导译文消歧

Unsupervised Translation Disambiguation by Using Semantic Dictionary and Mining Language Model from Web

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 刘鹏远赵铁军

【Author】 LIU Peng-Yuan,ZHAO Tie-Jun(Department of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China)

【机构】 哈尔滨工业大学计算机科学与技术学院

【摘要】 为了解决困扰词义及译文消歧的数据稀疏及知识获取问题,提出一种基于Web利用n-gram统计语言模型进行消歧的方法.在提出词汇语义与其n-gram语言模型存在对应关系假设的基础上,首先利用Hownet建立中文歧义词的英文译文与知网DEF的对应关系并得到该DEF下的词汇集合,然后通过搜索引擎在Web上搜索,并以此计算不同DEF中词汇n-gram出现的概率,然后进行消歧决策.在国际语义评测SemEval-2007中的Multilingual Chinese English Lexical Sample Task测试集上的测试表明,该方法的Pmar值为55.9%,比其上该任务参评最好的无指导系统性能高出12.8%.

【Abstract】 In order to solve the problem of data sparseness and knowledge acquisition in translation disambiguation and WSD(word sense disambiguation),this paper introduces an unsupervised method,based on the n-gram language model and web mining.It is supposed that there exists a latent relationship between the word sense and n-gram language model.Based on this assumption,the mapping between the English translation of Chinese word and the DEF of Hownet is established and the word set is acquired.Then the probabilities of n-gram in the words set are calculated based on the query results of a searching engine.The disambiguation is performed via these probabilities.This method is evaluated on a gold standard Multilingual Chinese English Lexical Sample Task dataset.Experimental results show that the model gets the state-of-the-art results(Pmar=55.9%) and outperforms 12.8% on the best system in SemEval-2007.

【基金】 国家自然科学基金No.60435020;国家高技术研究发展计划(863)Nos.2006AA01Z150,2006AA010108~~
  • 【文献出处】 软件学报 ,Journal of Software , 编辑部邮箱 ,2009年05期
  • 【分类号】TP391.1
  • 【被引频次】8
  • 【下载频次】419
节点文献中: 

本文链接的文献网络图示:

本文的引文网络