节点文献
基于形态学的单词-文档谱聚类方法
Method for spectral co-clustering documents and words based on morphology
【摘要】 本文利用形态学的方法确定聚类数目,并对单词-文档谱聚类方法进行改进.确定聚类数目主要分三个步骤:第一步将单词-文档谱聚类方法中产生的矩阵转换成可视化聚类趋势分析方法(visual assessment of tendency,VAT)灰度图,第二步利用灰度形态学、图像二值化、距离转换等图像处理技术过滤产生的VAT灰度图,第三步对过滤后的VAT灰度图建立信号图,并进行平滑处理,通过平滑后的信号图的波峰波谷数目确定文档集的聚类数目.实验表明,该方法能够提高单词-文档谱聚类方法的聚类效果.
【Abstract】 One of the major problems in spectral co-clustering analysis is the determination of the number of clusters in datasets,which is a basic input for most spectral co-clustering algorithms.In this paper,we propose a new method for automatically estimating the number of clusters in datasets and modify spectral co-clustering documents and words,which is based on an existing algorithm for visual assessment of tendency(VAT) of a data set,using several common image and signal processing techniques.The method determining the number of clusters includes three main steps.First,the input matrix generated by spectral co-clustering documents and words is created into reordered dissimilarity gray image,from the image it is better able to highlight the potential cluster structure in dataset.We generate gray image use the VAT algorithm.Then,sequential image processing operations are used to segment the regions of interest in the gray image and to convert the filtered image into a distance-transformed image.These processing operations consist of gray morphology,image binarization,distance transform.Finally,we project the transformed image onto the diagonal axis,which yields a one-dimensional signal,from which we can extract the number of clusters by major peaks and valleys after smoothing signal.When the number of clusters is determined,we modify the typical spectral co-clustering algorithm.The primary difference between typical spectral co-clustering algorithm and our method for spectral co-clustering documents and words based on morphology is that the number of clusters is set by user before the algorithm in typical algorithm,but user may not know the true value of it,if that is the case,the value maybe set incorrectly.Incorrect parameter settings may lead to inaccurate clustering.Automatic determination of the cluster number in our proposed algorithm also helps to improve the clustering results.Our experimental results show that the proposed method works well in practice.
【Key words】 spectral clustering; visual assessment of tendency; gray morphology;
- 【文献出处】 南京大学学报(自然科学版) ,Journal of Nanjing University(Natural Sciences) , 编辑部邮箱 ,2012年02期
- 【分类号】TP391.1
- 【下载频次】104