节点文献

基于新型采样技术的非平衡数据分类方法

Classification Method for Imbalanced Data Based on Novel Sampling Technique

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 刘子桐刘振远庞娜马铭

【Author】 LIU Zitong;LIU Zhenyuan;PANG Na;MA Ming;College of Computer Science and Technology,Beihua University;

【通讯作者】 马铭;

【机构】 北华大学计算机科学技术学院

【摘要】 在一些现实场景中,数据不平衡问题普遍存在,严重影响模型的预测结果。合成少数类过采样技术(Synthetic Minority Over-Sampling Technique, SMOTE)是解决非平衡分类问题的一种方法,但存在局限性。针对数据中的类不平衡问题,提出基于数据分布和聚类加权的改进SMOTE随机森林分类算法(Random Forest Using SMOTE Based on Data Distribution and Cluster Weighting, DCSMOTE-RF)。该算法通过获取样本分布信息,将少数类样本划分到不同簇群,根据簇群信息量为每个区域分配不同合成份额;少数类样本结合自身权重,生成相应规模的目标样本;通过基于随机森林学习评价训练数据。10组非平衡数据集仿真试验结果表明,DCSMOTE-RF算法对非平衡数据具有较好的预测效果。

【Abstract】 In some actual scenes, data imbalance is a common problem that significantly affects prediction results of models.Synthetic Minority Over-Sampling Technique is a method for addressing the problem of imbalanced classification, but it has limitations.Aiming at the problem of class imbalance in data, an improved random forest classification algorithm using SMOTE based on data distribution and cluster weighting is proposed.The algorithm acquires distribution information from samples, divides minority class samples into various clusters, and assigns different synthetic shares to each region according to the information ratios of the clusters.Minority class samples are combined with their weights to generate target samples of the corresponding scales.The data is trained through learning and evaluation based on random forest.Simulation tests on ten sets of imbalanced datasets demonstrate that DCSMOTE-RF achieves better prediction performance on imbalanced data.

【基金】 国家自然科学基金项目(42004153);北华大学研究生创新计划项目(2022007)
  • 【文献出处】 北华大学学报(自然科学版) ,Journal of Beihua University(Natural Science) , 编辑部邮箱 ,2024年05期
  • 【分类号】TP181
  • 【下载频次】52
节点文献中: 

本文链接的文献网络图示:

本文的引文网络