节点文献
高斯加权的重构性K-NN算法研究
Research on Gauss Weighed Reorganization K-NN
【摘要】 该文提出基于高斯加权距离以及聚类重构机制的K-NN文本聚类算法。文章提出K-NN近邻域的概念,通过高斯加权的近邻域算法实施K-NN聚类。利用高斯函数根据样本与聚类中心的距离为样本赋权,计算聚类距离。基于近邻域权重和聚类密度对形成的聚类实施重构,实现聚类数目的自适应调整。使用拆分算子拆分稀疏聚类并调整异常样本;使用合并算子合并相似聚类。实验显示聚类重构机制能够有效地提高聚类的准确率及召回率,增加聚类密度,使得形成的聚类结果更加合理。
【Abstract】 This paper presents a K-NN text clustering algorithm employing uses Gauss Weighed Distance and Cluster Reorganization Mechanism.The concept of Nearest Domain is proposed and Nearest Domain Rules are elaborated.Then Gauss Weighing Algorithm is designed to Quantification samples’ distance and weights.A text is weighed based on the distance from cluster center via Gauss function in order that distances of clusters can be calculated.Further,Cluster Reorganization Mechanism will make a self-adaption to the amount of clusters.Splitting operator separates sparse clusters and adjusts abnormal texts while consolidating operator combines similar ones.Clustering experiment shows that reorganization process effectively improves the accuracy and recall rate and makes result more reasonable by increasing the inner density of clusters.
【Key words】 text clustering; K-NN; Gauss weighing; nearest domain rule; cluster reorganization;
- 【文献出处】 中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2015年05期
- 【分类号】TP311.13;TP391.1
- 【被引频次】3
- 【下载频次】154