节点文献
SHITS:一种基于超链接和内容的网页排序方法
SHITS:a WebPage Ranking Method Based on Hyperlink and Content
【摘要】 回顾了当前应用于大型搜索引擎的主流网页排序算法,对其中的ARC算法进行了改进,提出了一种基于超链接和内容的网页排序算法—SHITS(Similarity-HITS)算法.SHITS算法用超链接所引用的网页内容代替了ARC算法中所采用的锚文本来评估该超链接的重要性,这一改进不仅提高了算法区分链接重要性的能力,也避免了对大量锚文本内容的分析.通过与相关算法的对比实验,结果表明SHITS算法网页排序的准确率明显优于其它算法.此外,SHITS算法也具有较好的效率计算代价小于ARC算法,与HITS算法相当.
【Abstract】 This paper reviews currently dominating webpage ranking algorithms,improves the ARC algorithm among of them,and proposes an algorithm based on hyperlink and content—the SHITS(Similarity-HITS)algorithm.The SHITS algorithm uses the webpage content cited by the hyperlinks to evaluate the importance of these hyperlinks instead of the anchors used in the ARC algorithm,which not only improves the ability to differentiate the importance of hyperlinks,but also needn’t analyze the content of the numerous anchors in web pages.From the contrastive experiment with the related algorithms, the result shows that the precision of the SHITS algorithm was significantly higher than that of other algorithms. Furthermore, the SHITS algorithm has a good performance:its computational cost is smaller than that of the ARC algorithm, and approximate to that of the HITS algorithm.
- 【文献出处】 小型微型计算机系统 ,Journal of Chinese Computer Systems , 编辑部邮箱 ,2006年12期
- 【分类号】TP391.3
- 【被引频次】18
- 【下载频次】266