节点文献
基于图聚类的用户生成文本突发话题检测方法
Graph-based Bursty Topic Detection Approach in User-generated Texts
【摘要】 提出一种基于图聚类的突发话题检测方法(G-BTD).该方法用有向加权图表示包含突发热点话题的文本集,顶点为突发词,有向边表示突发词之间非对称的相关性,边的权重表示相关的程度.由于相同话题的重要话题词以双向高权重的边相连,形成强连通子图,因此本文采用提取强连通子图的方法进行话题检测.实验表明,G-BTD方法在英文Live Journal博客和中文新浪微博两个文本集中的突发话题检测效果优于概率话题模型LDA和基于突发特征的EGF的方法.
【Abstract】 This paper develops an effective bursty topic detection approach with a graph-based perspective that well reflects the latent pattern of bursty topics in text stream. Texts with topics are represented using a directed and weighted graph,with the bursty words as vertices and Tversky index of bursty words being edges. We then partition the bursty word graph into the graph’s strongly connected components,each significant one corresponding to a bursty topic,based on the analysis that the important topical words within a graph are connected to each other with high weights. Experiments on two real corpora collected from English weblog and Chinese weibo( microblog) sites demonstrate that the proposed approach can effectively detects the hot bursty topics,more appropriate than the LDA topic model and the bursty feature-based EGF approach.
- 【文献出处】 小型微型计算机系统 ,Journal of Chinese Computer Systems , 编辑部邮箱 ,2015年08期
- 【分类号】TP391.1
- 【被引频次】3
- 【下载频次】153