节点文献

基于URL定位信息的BBS数据挖掘方法研究

Study on Algorithm of BBS Data Mining Based on URL Location Information

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 赵哲马晓珺

【Author】 Zhao Zhe;Ma Xiaojun;Department of Information Engineering Anyang Normal University;Computer Teaching Department of Anyang Normal University;

【机构】 安阳师范学院计算机与信息工程学院安阳师范学院公共计算机教学部

【摘要】 利用Web页面的采集序位和被检索页面的相关信息和主题,使得以主题为分块的网络爬虫算法,能够尽可能多地把整个Web按照主题为依据进行分块整合,可以采用对URL定位信息,提高了页面的高效检索能力。仿真实验中表明,提出的主题相关爬虫算法能够跨越BBS中URL网页中的断裂带,提高了URL网页的召回率,也不至于因为网页的断裂而中止检索。算法精度分析表明,误判点都在等分线附近徘徊,偏差不大,表明算法精度较高。

【Abstract】 The collection sequences of Web pages and the relative information and focuses were taken in use,and made the network crawler algorithm divide and integrate the Web pages based on the focuses,the URL location information was used and the performance of efficient retrieval for the pages was improved.Simulation and experiments were taken based on the real BBS,and result shows that the focused relative crawler algorithm which proposed here can overcome the fracture zone of the URL pages in the BBS,and the recall rate of URL information is improved and the retrieval cannot be discontinued for the fracture.The precision analysis result of the algorithm shows that the erroneous judge points are distributed around the accurate judge line,the result is good.

  • 【文献出处】 科技通报 ,Bulletin of Science and Technology , 编辑部邮箱 ,2014年04期
  • 【分类号】TP393.092;TP391.3
  • 【被引频次】2
  • 【下载频次】113
节点文献中: 

本文链接的文献网络图示:

本文的引文网络