节点文献
基于Bootstrap方法最大熵优化过采样算法
An Over-Sampling Algorithm for Maximum Entropy Optimization Based on Bootstrap Method
【摘要】 随着数据时代的到来,非平衡数据的分类问题受到越来越多的关注。在非平衡数据的分类问题中,往往因为少数类样本与多数类样本比例失衡而导致分类结果错误。因此,提出了一种在最大熵原理下基于自助法(Bootstrap method)的过采样算法。首先,通过自助法获得数据样本的概率分布,并用最大熵原理对概率分布进行优化;其次,根据少数类生成新的少数类的能力不同,提出基于少数类样本分布的概率增强算法。该算法使数据随机性得到了充分体现,保证了少数类样本的概率密度在数据集平衡前后保持一致性,从而提高分类算法的有效性;最后,通过从UCI和KEEL数据库选取8组数据进行实验,实验结果表明所提出的新算法比现有的其他算法更有效。
【Abstract】 With the advent of the data era, the classification of unbalanced data is receiving more and more attention. In the classification of unbalanced data, classification results are often incorrect due to an imbalance in the ratio of minority class samples to majority class ones. Therefore, we propose an oversampling algorithm based on the Bootstrap method under the maximum entropy principle. Firstly, the probability distribution of the data sample is obtaited through self-help method and optimized using the principle of maximum entropy. Secondly, a probability enhancement algorithm based on minority class sample distribution is proposed based on different abilities of minority classes to generate new minority classes. The algorithm allows the randomness of the data to be fully represented and ensures that the probability density of the minority class remains consistent before and after the data set is balanced, thus improving the effectiveness of the classification algorithm. Finally, experiments are conducted by selecting eight data sets from the UCI and KEEL databases, whose results show that the proposed algorithm is more effective than other algorithms.
【Key words】 unbalanced data; Bootstrap method; principle of maximum entropy; probability enhancement; classification;
- 【文献出处】 数据采集与处理 ,Journal of Data Acquisition and Processing , 编辑部邮箱 ,2023年03期
- 【分类号】TP311.13
- 【下载频次】55