节点文献

基于决策树算法的爬虫识别技术

Crawler Recognition Technology Based on Decision Tree Algorithm

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 刘宇程学林

【Author】 LIU Yu;CHENG Xue-lin;Zhejiang University of Software College;

【机构】 浙江大学软件学院

【摘要】 网络爬虫指的是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本[1]。但是实际上爬虫还分为正规爬虫和非正规爬虫,所谓的正规爬虫就是通过正规途径和手段获取网站信息和数据,非正规爬虫又称为恶意爬虫,主要用于非法盗窃数据,给网站服务器增加负担以及偷窥一些敏感信息数据等。本文将会基于决策树算法设计一种新爬虫检测技术,并根据爬虫检测结果提供一些反爬机制,对恶意爬虫进行进行评屏蔽等,进而实现对网站和服务器以及部分数据,信息的保护,降低互联网资源重叠现象。

【Abstract】 Web crawler refers to a program or script that automatically captures World Wide Web information according to certain rules[1]. But in fact the crawler is divided into formal and informal crawler, web crawler is the so-called formal information and data through formal channels, the non formal also means malicious crawler, mainly used for illegal theft of data, increase the burden and peep some sensitive information to the web server data. This paper will design a new decision tree algorithm based on crawler identification technology, which gives some anti-climbing mechanism according to the recognition result of malicious crawler, and shield some malicious crawler, and thus we can realize the protection of the website and server as well as part of the data to reduce the Internet resource overlap.

  • 【文献出处】 软件 ,Computer Engineering & Software , 编辑部邮箱 ,2017年07期
  • 【分类号】TP301.6;TP393.092
  • 【被引频次】8
  • 【下载频次】311
节点文献中: 

本文链接的文献网络图示:

本文的引文网络