节点文献
基于Web元数据的定题信息采集
Topic-specific information gathering based on Web metadata
【摘要】 针对定题Web检索技术,研究了元数据在定题Web信息采集中的重要作用。设计了基于Web元数据的主题扩展系统及定题信息采集系统,并给出了系统实现的具体步骤。同时提出了基于Web元数据的多种定题信息采集策略。实验证明经主题扩展的Web元数据可作为网页主题相关性的重要判别依据,带增益的元数据平均权值启发式采集策略算法具有较好的性能。
【Abstract】 Topic-specific Web search engine is a new direction of information retrieval.Rather than collecting and indexing all accessible Web documents,the topic-specific Web search system restricts its crawl boundary to find links that likely to be most relevant to the given topic.Topic-specific information gathering is the sticking point in the full system.The significance of Web metadata in topic-specific information gathering is discussed.Meanwhile,based on Web metadata,a topic expansion system and a topic-specific information gathering system are designed and a new approach for guiding crawlers to gather topic relevant pages is proposed.Experimental results indicate that the proposed approach has better performance.
【Key words】 data processing; network information; information gathering; gathering approach;
- 【文献出处】 系统工程与电子技术 ,Systems Engineering and Electronics , 编辑部邮箱 ,2007年02期
- 【分类号】TP311.10
- 【被引频次】6
- 【下载频次】260