节点文献
一种基于相似度分析的主题提取和发现算法
A Similarity-Based Algorithm for Topic Exploration and Distillation
【摘要】 试图从另一个角度来考察主题提取算法HITS,即提出一种基于相似度的链接分析模型来观察主题提取的过程.通过给出一种一般化的相似度定义,提出了一种仅使用链接分析来改善主题提取的质量的主题提取算法.同时,还将主题发现的功能也结合到了算法的框架中.通过该功能,用户可以搜索到次流行的主题.实验结果显示了这一新算法的两个优点:不必使用内容分析即能改善主题提取的质量以及能够进一步发现在查询结果中显现出来的不同主题.
【Abstract】 In this paper, the authors attempt to revisit the behaviour of HITS from a different point of view. Namely, a similarity-based analysis model is proposed to observe the distillation procedure. By defining a generalized similarity, an algorithm is presented, which can improve the quality of distillation using only hyperlinks. A topic exploration function is also integrated into the algorithm framework, which enables end-users to search less popular topics when multi-topics are involved in queries. The experimental results reveal two benefits from the new algorithm: the improvement of distillation quality without utilizing any content information of pages, and an additional ability to explore the topics emerging in the query results.
【Key words】 topic distillation; topic exploration; linkage analysis; Web searching;
- 【文献出处】 软件学报 ,Journal of Software , 编辑部邮箱 ,2003年09期
- 【分类号】TP393.09
- 【被引频次】86
- 【下载频次】1141