节点文献
基于FCM的簇内欠采样算法
Fuzzy C-means clustering based undersampling in clusters
【摘要】 针对传统分类器在不平衡数据集上性能降低的问题,提出一种基于FCM的簇内欠采样算法(Fuzzy C-means clustering Based Under Sampling In Clusters, FCMUSIC)。使用模糊c-均值聚类算法(Fuzzy C-Means clustering, FCM)将多数类样本划分成若干簇,在每个簇内以类别不平衡比率(imbalanced ratio, IR)的倒数作为采样倍率,得到新的多数类样本并与少数类样本合并,形成新的平衡样本集,结合KNN和Random Forest分类器进行分类。分析在5组不平衡数据集上的分类结果,当使用KNN分类器时,改进后的算法的F1值平均提高了6.65%,G-mean值平均提高了7.75%;使用Random Forest分类器时,F1值平均提高了5.31%,G-mean值平均提高了6.07%。表明FCMUSIC算法能够有效地提升传统分类器对不平衡数据集的分类性能。
【Abstract】 In order to prevent performance degradation of traditional classifiers on imbalanced data sets, a new algorithm named as Fuzzy C-Means clustering Based Under Sampling In Clusters(FCMUSIC) is proposed.The fuzzy c-means clustering algorithm(FCM) is used to divide the majority class samples into several clusters.In each cluster, the reciprocal of the imbalanced ratio(IR) is used as the sampling rate to generate new majority class samples, and then, the new majority class samples is blended with the minority class samples to form a new balanced sample set, to which the KNN and Random Forest classifiers are applied.The classification using the KNN classifier on 5 sets of imbalanced data sets is analyzed, and the result shows that the F1 score of the improved algorithm increases by 6.65% on average and the G-mean value increases by 7.75%.When using the Random Forest classifier, the F1 value increases by 5.31% on average and the G-mean value increases by 6.07%.Results show that the FCMUSIC algorithm can effectively improve the classification performance of traditional classifiers on imbalanced data sets.
- 【文献出处】 南昌大学学报(理科版) ,Journal of Nanchang University(Natural Science) , 编辑部邮箱 ,2021年05期
- 【分类号】TP311.13
- 【被引频次】1
- 【下载频次】94