节点文献

基于对比学习的动态网页用户评论获取方法

A Contrast Learning Based Approach for Customer Reviews Crawling from Dynamic Web Pages

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 冉熙璐段磊吕广奕陈珂李钟麒黄东兰唐常杰

【Author】 Ran Xilu,Duan Lei,Lv Guangyi,Chen Ke,Li Zhongqi,Huang Donglan,and Tang Changjie (School of Computer Science,Sichuan University,Chengdu 610065)

【机构】 四川大学计算机学院

【摘要】 随着Web 2.0相关技术的发展,传统爬虫无法适于动态网页中用户评论的获取.主要工作包括:1)分析了动态网页用户评论的挑战;2)设计了一种新的用户评论获取方法ReviewCrawler,ReviewCrawler根据网页的DOM树,利用对比学习思想挖掘包含用户评论的节点,并在获取评论中学习新特征词;3)利用真实商品评论验证了ReviewCrawler准确性、有效性.实验表明ReviewCrawler的查全率及查准率大于98%.同时ReviewCrawler具有良好的伸缩性,能够满足获取海量用户评论的要求.

【Abstract】 With the development of Web 2.0 related techniques,traditional crawler cannot effectively crawl the customer reviews from dynamic Web pages.The main contributions of this paper include: 1) analyzing the challenges of Customer Reviews Crawling from Dynamic Web Pages;2) developing a novel approach named ReviewCrawler for Customer Reviews Crawling.By the DOM tree of a Web page,ReviewCrawler applies the thought of contrast learning to discovering the nodes containing customer reviews,and learning the new feature words from the crawled reviews;3) applying ReviewCrawler to crawl cross-domain customer reviews in the real-world online shopping Web sites. The experimental study shows that the recall of ReviewCrawler and the precision is larger than 98%. Moreover,ReviewCrawler has good flexibility,which can satisfy the requirement of crawling massive customer reviews.

【基金】 国家自然科学基金项目(61103042);高等学校博士学科点专项科研基金项目(20100181120029)
  • 【会议录名称】 第29届中国数据库学术会议论文集(B辑)(NDBC2012)
  • 【会议名称】第29届中国数据库学术会议(NDBC2012)
  • 【会议时间】2012-10-12
  • 【会议地点】中国安徽合肥
  • 【分类号】TP393.092
  • 【主办单位】中国计算机学会(China Computer Federation)
节点文献中: 

本文链接的文献网络图示:

本文的引文网络