节点文献
基于自动分类的网页机器人
Internet Robot Based on Automatic Classification
【摘要】 随着互联网的普及和发展,网络上的信息资源越来越丰富,它需要高效智能的工 具来完成信息资源的采集。WWW上的网页抓取器,又称Robot. 讨论了抓取器与文本自动分类 器相结合,对用户要求领域网页的收集。抓取器找到相关链接进行抓取,而避免对非相关链 接的抓取。这样可以节省硬件、网络资源和提高抓取器的效率。
【Abstract】 With the rapid expansion of Internet and the continuous increase of the amount of information on WWW.It is desired to develop efficient and intelli gentized tools to do it.A WWW information discovery and collect tool is called a robot. This paper disusses the combination of the text automatic classification with robot . The goal is to selectively seek out pages that are relevant to a p re-defined set of topics. The robot finds the link that is likely to be most rel evant for the robot,and avoids irrelevant regions of the Web.This leads to signi ficant savings in network resource, and keeps robot more efficient.
【Key words】 Internet robot; Text automatic classification; Vector space model;
- 【文献出处】 计算机工程 ,Computer Engineering , 编辑部邮箱 ,2003年21期
- 【分类号】TP393.09
- 【被引频次】8
- 【下载频次】161