节点文献
基于输入点集求解k-Means聚类算法
An Approximate Algorithm for k-Means Problem Based on Input Points
【Author】 Wang Shouqiang1,2,Zhu Daming1,Shi Shiying2 1.School of Computer Science and Technology,Shandong University,Jinan 250010,P.R.China2.Department of Information Engineering,Shandong Jiaotong University,Jinan 250023,P.R.China
【机构】 山东大学计算机科学与技术学院; 山东交通学院信息工程系;
【摘要】 k-Means聚类是聚类划分中应用最广泛的一种方案,但是现在许多关于此问题的研究并没有给出近似比为常数的算法。本文给出了一个随机算法,该算法通过以不同概率选取初始k个点,保证了以一定概率分别属于不同最优聚类簇的k个点。以这k个点作为初始中心点对输入点集进行交换分别执行局部搜索算法,证明了可得到期望近似比至多为2的解。实验结果表明该算法能够取得较优的近似解结果。
【Abstract】 The k-Means clustering is one of the most popular schemes for discovering clusters in data.Its aim is to minimize the mean squared distance from each data to its nearest center.A lot of variants of Lloyd’s heuristic have been studied.Un-fortunately,many research results haven’t given any approximate ratio.In this paper,an algorithm is presented which can ob-tain the optimal clustering with the ration of at most 2.The main idea of this algorithm is that k points are selected by means of a very simple,randomized seeding technique and then the local search is implemented to improve the accuracy.Some ex-amples are selected to verify our algorithm and got better results both the speed and accuracy than the former methods.The main algorithmic contribution is that the input points are used as the candidate sets to obtain the optimal clustering with a con-stant ratio by means of local search technique and one method of selecting initial points.
- 【会议录名称】 第二十六届中国控制会议论文集
- 【会议名称】第二十六届中国控制会议
- 【会议时间】2007-07-26
- 【会议地点】中国湖南张家界
- 【分类号】TP301.6
- 【主办单位】中国自动化学会控制理论专业委员会(Technical Committee on Control Theory,Chinese Association of Automation)