节点文献

Web信息检索中基于超链接的网页评估算法的研究

Research on the Hyperlink-based Ranking Algorithms in Web Information Retrieval

【作者】 董志勇

【导师】 刘惠义;

【作者基本信息】 河海大学 , 计算机应用技术, 2004, 硕士

【摘要】 互联网包含有海量网页,越来越多的用户通过搜索引擎寻找特定信息。Web信息检索的目的是在网页集合中找到与用户查询相关的所有网页,而网页评估算法将对这些网页进行评估后显示给用户。Google的PageRank算法通过对超链接结构的分析,有效地提高了搜索结果的排序质量。外推法(Extrapolation)是减少PageRank算法迭代次数的有效方法,而Aitken Extrapolation算法和基于特征值直接求解的算法是其典型代表。 Aitken Extrapolation算法使用Aitken变换以减少迭代次数,但Aitken变换在多数情况下无法确保算法收敛。为了弥补该算法的缺陷,本文在分析了该算法不稳定性的根源后提出了改进算法FNN-Aitken Extrapolation。与原算法相比,改进算法不频繁调用Aitken变换,而执行选择适当时机对Aitken变换调用一次的策略。理论分析和实验结果表明改进算法不仅能够弥补原算法的不稳定性而且通常比Power Method算法具有更少的迭代次数。 与Aitken Extrapolation算法相比,基于特征值求解的算法不借助Aitken变换,而通过特征值直接求解马尔可夫超链接矩阵的主特征向量,从理论上比Aitken Extrapolation算法更高效。然而基于特征值直接求解算法的迭代次数与参数d的选择密切相关,而参数d的确定目前无明显规律可寻。另一方面,Adaptive方法通过将马尔可夫超链接矩阵稀疏化以达到节省迭代时间的目的。本文将Adaptive方法分别与基于特征值直接求解算法和Aitken Extrapolation算法相结合,实验结果初步证明了这两个新算法能够缩短迭代时间。

【Abstract】 With the enormous volume of Web pages in Internet, it comes that Internet users are increasingly using search engines to find specific information. The goal of Web information retrieval is to find all Web pages for a user query in a collection of Web Pages and the task of ranking algorithm is evaluating related Web pages for users. By analysis the structure of hyperlinks, PageRank algorithm which is applied in Google efficiently promote the result of search. Extrapolation Method, composed of Aitken Extrapolation and eigenvalues-based algorithm mainly, can notably reduce iterations commonly .Aitken Extrapolation algorithm use Aitken transform to reduce iterations. Because Aitken transform can not converge in many conditions, this paper presents an improved algorithm FNN-Aitken Extrapolation algorithm by analysis the reason of instability. Instead of calling Aitken transform repeatedly, the improved algorithm call Aitken transform in choosing proper occasion only once. Theoretical analysis and experiment results show that the improved algorithm not only avoid the instability of original algorithm but also reduce iterations compared to Power Method.Compared with Aitken Extrapolation, eigenvalues-based algorithm bypass Aitken transform and perform more effectively than Aitken Extrapolation algorithm theoretically in the process of iterating hyperlink-based Markov matrix. Yet the performance of eigenvalues-based algorithm has a closed relation with d parameter which is chosen with no evident rule. On the other hand, Adaptive method can save running time of iterations by increasing the sparsity of Markov hyperlink matrix. For the purpose of saving running time of iterations, this paper apply Adaptive method to Aitken Extrapolation algorithm and eigenvalues-based algorithm respectively. Experimental results elementarily show that these two new algorithms can speed up the performance of iterations.

  • 【网络出版投稿人】 河海大学
  • 【网络出版年期】2004年 03期
  • 【分类号】TP393.09
  • 【被引频次】3
  • 【下载频次】263
节点文献中: 

本文链接的文献网络图示:

本文的引文网络