节点文献

ISTC: A New Method for Clustering Search Results

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【Author】 ZHANG Wei1, XU Baowen1,2?, ZHANG Weifeng3, XU Junling1 1. School of Computer Science and Engineering, Southeast University, Nanjing 211189, Jiangsu, China; 2. State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, Hubei, China; 3. Department of Computer, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China

【摘要】 A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algo-rithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the im-proved algorithm offers higher accuracy than traditional suffix tree clustering.

【Abstract】 A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algo-rithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the im-proved algorithm offers higher accuracy than traditional suffix tree clustering.

【基金】 Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086);Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow Uni-versity (KJS0714);Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082);National Natural Science Foundation of Jiangsu (BK2006094).
  • 【文献出处】 Wuhan University Journal of Natural Sciences ,武汉大学自然科学学报(英文版) , 编辑部邮箱 ,2008年04期
  • 【分类号】TP393.09
  • 【被引频次】4
  • 【下载频次】35
节点文献中: 

本文链接的文献网络图示:

本文的引文网络