节点文献
基于URL定位信息的BBS数据挖掘方法研究
Study on Algorithm of BBS Data Mining Based on URL Location Information
【摘要】 利用Web页面的采集序位和被检索页面的相关信息和主题,使得以主题为分块的网络爬虫算法,能够尽可能多地把整个Web按照主题为依据进行分块整合,可以采用对URL定位信息,提高了页面的高效检索能力。仿真实验中表明,提出的主题相关爬虫算法能够跨越BBS中URL网页中的断裂带,提高了URL网页的召回率,也不至于因为网页的断裂而中止检索。算法精度分析表明,误判点都在等分线附近徘徊,偏差不大,表明算法精度较高。
【Abstract】 The collection sequences of Web pages and the relative information and focuses were taken in use,and made the network crawler algorithm divide and integrate the Web pages based on the focuses,the URL location information was used and the performance of efficient retrieval for the pages was improved.Simulation and experiments were taken based on the real BBS,and result shows that the focused relative crawler algorithm which proposed here can overcome the fracture zone of the URL pages in the BBS,and the recall rate of URL information is improved and the retrieval cannot be discontinued for the fracture.The precision analysis result of the algorithm shows that the erroneous judge points are distributed around the accurate judge line,the result is good.
【Key words】 network crawler algorithm; URL location information; BBS information retrieval; data mining;
- 【文献出处】 科技通报 ,Bulletin of Science and Technology , 编辑部邮箱 ,2014年04期
- 【分类号】TP393.092;TP391.3
- 【被引频次】2
- 【下载频次】113