节点文献
基于互信息的Web文档聚类方法
Method of Web Document Clustering Based on Mutual Information
【摘要】 由于网络信息的激增,如何充分利用大量的信息,并有效地为Web用户服务成为一个急需解决的问题。相关研究表明利用Web文档聚类的方法可以缩小信息检索的范围,提高查询准确率。通过分析Web文档的特征以及常用Web文档聚类方法的优缺点,提出了一种基于互信息理论的Web文档聚类的方法。在聚类的过程中,计算特征词之间的互信息值,根据阈值判断特征词是否属于同一类别。实验结果表明,该方法与K-Means聚类算法相比较,在准确率和召回率方面均有提高。
【Abstract】 With the increase of information on Web,making full use of information and providing effective services become a burning problem.The scope of search is reduced and the precision of information retrieval is raised based on Web document clustering.The characteristics of the text as well as commonly used text clustering method is analyzed,a method of Web document clustering is proposed based on mutual information.In the process of clustering,mutual information value of terms is calculated to judge whether they are in the same sort according to the threshold.Evaluation results show that the precision and the recall can be significantly improved compared with K-Means clustering method.
【Key words】 information retrieval; document clustering; mutual information; term selection; vector space mode;
- 【文献出处】 广西师范大学学报(自然科学版) ,Journal of Guangxi Normal University(Natural Science Edition) , 编辑部邮箱 ,2007年02期
- 【分类号】TP391.1
- 【被引频次】9
- 【下载频次】277