节点文献

网站主题目录重要性评价

Evaluation Significance of Website’s Subject Catalog

【作者】 王芳

【导师】 于浩;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2006, 硕士

【摘要】 随着Internet/Intranet相关技术的迅猛发展,网上信息量迅速增长,为了有效利用网络信息资源,如何对网络信息资源的质量与价值评价研究越来越成为一个迫切的课题。目前,许多国内外学者,给出了网络信息资源评价对象、评价指标及评价方法等。另外,针对具体网络对象评价研究也取得了一些成果,主要针对学术网站进行人工评价、网页重要性基于链接关系的评价给出了经典的PageRank、HITS等算法,同时,采用机器学习方法对网页中分块信息的重要程度的评价研究等。然而,通常门户网站都按照主题分类对网页进行组织,网站主题目录网站的一个重要组成,而每个主题目录重要性程度不同,因此,本文给出了网站主题目录重要性的评价研究。本文采用定量的方法,从每个主题目录所包含的网页数量、网页每月更新率及包含所有网页的重要性均值等评价指标出发进行网站主题目录的评价。将上述指标量化,主题目录对应指标量化结果值越大越重要。本文首先提出了网站主题目录自动抽取算法,并进行网站主题目录的归并,然后根据不同评价指标进行网站主题目录的重要性评价。门户网站首页导航链接包含大量主题目录信息,但主题目录存在包含关系,而且导航链接中同时存在一些类似广告链接的非主题目录信息,所以本文从导航链接中抽取重要且概括性强的链接作为网站主题目录。网站主题目录自动归并将其它非主题目录的导航链接归并到主题目录下。实质是利用本网站网页集合已有的导航链接主题类别,以提高网页分类到各个主题目录下的精度。最后,对网站主题目录及对应的网页集合按照前述的评价指标进行重要性的自动评价。另外,本文借鉴信息检索检索得到的文档结果排序的评价方法,以ALexa网站利用点击率指标进行网站主题目录评价结果为标准答案,给出了对网站主题目录评价结果的评测方法及相应的评测结果。评测最好结果精确率可以达到83%以上。本系统的实现有助推动网络资源评价研究的发展,并为普通用户上网获取信息提供了更多的指导,有助于网站经营者不断提高网站质量。同时,有助于基于主题目录及信息检索的发展,对潜在中心网站主题的发现提供线索,有利于实现特定主题的Web资源分类。

【Abstract】 As the rapid development of Internet/Intranet technique, Internet information increases dramatically. How to evaluate the quality and value of Internet information resource becomes an urgent task in order to use it more effective. Nowadays, researchers have given evaluation target, indicator and method. There are a lot of researches focused on kinds of specific Internet resources as well, that evaluation of technicality websites artificial, of web pages’significance based on link analysis using PageRank and HITS algorithms and the appraisal of web blocks partitioned from web page using machine learning algorithm. But web portals organize their pages according to web subject and all offer a large number of subject classified information. Subject catalogues are vital part of web portals and the importance of each other are different. So, a way is proposed to evaluate the significance of web site subject catalogues automatically in our paper.The quantitative method is used to evaluate subject catalogues on all indicators including quantity of pages, update rate of pages per month and mean of all page’s significance in each subject catalogue. After all the indicators quantized, the evaluation result is sorted according to the value of quantized indicators from big to small.In the paper, firstly the algorithm is proposed to extract web site subject catalogue automatically, and then is how to integrate subject catalog. Finally the web site subject catalogues are evaluated according to the indicators. Usually, the navigation links of portal website’s homepage can give lots of catalogue information, but they have relation of including and include lots of links for advertisement. So, we have to distinguish recapitulative and significant links from navigation links as web site subject catalog and the others must be combined into. On the website subject catalog integrating the key insight is that many of the web pages have their own categorization from navigation links’subject, and accuracy of web pages classified into subject catalogues can be improved by factoring in the implicit information in the source categorization. At last, the subject catalogues with their pages are evaluated according to the indicators.In addition, the paper uses evaluation criterion on sorting the result documents

  • 【分类号】TP393.092
  • 【下载频次】273
节点文献中: 

本文链接的文献网络图示:

本文的引文网络