节点文献
基于不定核的大间隔聚类算法研究
Research on Maximum Margin Clustering Based on Indefinite Kernel
【作者】 李森;
【导师】 薛晖;
【作者基本信息】 东南大学 , 计算机科学与技术, 2016, 硕士
【摘要】 核方法是机器学习领域中解决非线性学习问题的一种有效方法,大都要求核函数正定,然而,在实际问题中这样的要求常常很难满足;相反,在某些情况下,使用不定核往往能取得比正定核更好的效果,如基因识别、目标检测问题等。近年来,不定核问题越来越受到研究者们的关注,多种解决不定核分类问题的方法被提出并取得很好的效果,如谱变换方法、正定核替代策略等。然而,关于不定核聚类问题的研究却相对较少,现有基于核的聚类算法也大都基于正定核,不能直接处理核函数不定的情况。鉴于已有不定核方法在分类问题中的优异表现,本文希望借助这些方法研究基于不定核的聚类问题。具体地,本文以经典的基于核的大间隔聚类模型(Maximum Margin Clustering, MMC)为基础,提出了一种基于不定核的大间隔聚类模型(Indefinite Kernel Maximum Margin Clustering, IKMMC)。IKMMC采取正定核替代策略,寻求一个正定核以逼近不定核,并将度量二者差异性的F-范数作为一个正则化项嵌入到MMC模型中,进而得到IKMMC模型。针对该模型,本文选取了迭代优化方法进行优化:首先给样本赋初始类别标记,在每轮迭代中,不定核聚类问题被转化为带有类平衡约束的不定核支持向量机(Indefinite Kernel Support Vector Machine, IKSVM)问题,并被进一步表达为半无限规划(Semi-infinite Programming, SIP)形式求解;本轮优化得到的样本预测标记作为下轮迭代的样本初始标记,直到样本预测错误率不再满足迭代条件;最后,IKMMC以最后一轮的样本预测标记作为聚类的最终结果。实验部分验证了IKMMC及其迭代优化算法的有效性。MMC模型主要用于两类样本聚类,为了使IKMMC能够适应更为复杂的多类情况,本文进一步提出了多类情况下的IKMMC模型,并给出了相关优化算法,通过在多个数据集上的实验证明了IKMMC及其优化算法在多类情况下依然有较好的方法性能。
【Abstract】 Indefinite kernels have attracted more and more attentions in machine learning due to its wider application scope than usual positive definite kernels, such as in gene identification and object recognition. Recently, many indefinite kernel methods have been proposed to solve classification problems and achieved much better performance. However, the research about indefinite kernel clustering is relatively scarce. Furthermore, existing clustering methods are mainly designed based on positive definite kernels which are incapable in indefinite kernel scenarios. In this paper, we will focus on the problem of indefinite kernel clustering. In view of the excellent performance of indefinite kernel classification methods, we aim to utilize these methods to direct indefinite kernel clustering. Consequently, based on the state-of-the-art maximum margin clustering (MMC) model, we propose a novel algorithm termed as indefinite kernel maximum margin clustering (IKMMC). IKMMC tries to find a proxy positive definite kernel to approximate the original indefinite one and thus embeds a new F-norm regularizer in the objective function to measure the diversity of the two kernels, which can be further optimized by an iterative approach. Concretely, at each iteration, given a set of initial class labels, IKMMC firstly transforms the clustering problem into a classification one solved by indefinite kernel support vector machine (IKSVM) with an extra class balance constraint and then the obtained prediction labels will be used as the new input class labels at next iteration until the error rate of prediction is smaller than a pre-specified tolerance. Finally, IKMMC utilizes the prediction labels at the last iteration as the expected indices of clusters. Moreover, we further extend IKMMC from binary clustering problems to more complex multi-class scenarios. Experimental results have shown the superiority of our algorithms.
【Key words】 Indefinite Kernel; Maximum Margin Clustering; Support Vector Machine; Kernel Method;