节点文献
基于优化随机游走模型的文本热点主题探测研究
Research on Text-oriented Hot Topics Detection Based on an Optimized Model of Random Walk
【摘要】 【目的/意义】结合随机游走算法Page Rank、词共现和多样性测度指标提出一种改进的热点主题探测方法C_BI-Page Rank,该方法有效提高了热点主题探测的效率和模型质量。【方法/过程】首先对Page Rank算法进行理论回顾,引入词共现和布里渊系数构建C_BI-Page Rank算法模型,然后采用4种不同因素组合的Page Rank算法对Web of Science系统2006-2016的应用心理学领域的期刊文献进行实证分析,最后基于波达计数的专家方法进行算法比较与评价,同时也探索其与词频统计之间的相关性问题。【结果/结论】实证表明C_BI-Page Rank不仅在运行效率上收敛快、运行时间少且质量评估优势明显。该方法引入不同文本主题因素,一定程度解决传统词频分析和机器学习的不足,为热点主题探测方法提供了新思路。
【Abstract】 【Purpose/significance】Combined with Page Rank stemming from the random walk algorithm, co-occurrence of words and measurement of diversity Brillouin’s Index, a new modified method called C_BI-Page Rank is proposed, which improves the efficiency and model quality of hot topic detection efficaciously.【Method/process】Firstly, this paper gives a theoretical review of the Page Rank algorithm, thus to construct the model of C_BI-Page Rank by introducing co-occurrence of words as well as Brillouin’s Index. Then four algorithms of Page Rank grouped with distinctive factors are utilized in data processing and empirical modeling of articles in the field of "applied psychology" from Wo S system through 2006 to 2016.Finally, we adopt an objective expert method established by "Borda Count" to compare and evaluate these methods. Also,the relations between C_BI-Page Rank and frequency statistics are explored specifically.【Result/conclusion】The result indicates C_BI-Page Rank not only has advantages on fast convergence and low cost of running time, but also takes the lead in the performance of quality evaluation. This new method introduces distinctive text topic factors, to solve the shortcomings of traditional frequency analysis and machine learning models to some extent, and provide new ideas for the system of topic detection.
【Key words】 hot topic detection; random walk; Page Rank; Brillouin’s Index;
- 【文献出处】 情报科学 ,Information Science , 编辑部邮箱 ,2018年01期
- 【分类号】G353.1
- 【被引频次】1
- 【下载频次】341