节点文献
面向文本分类的基于最小冗余原则的特征选取
Feature Selection Based on Minimal Redundancy Principle for Text Classification
【摘要】 在文本分类中,为了降低计算复杂度,常用的特征选取方法(如IG)都假设特征之间条件独立。该假设将引入严重的特征冗余现象。为了降低特征子集的冗余度,本文提出了一种基于最小冗余原则(minimal RedundancyPrinciple,MRP)的特征选取方法。通过考虑不同特征之间的相关性,选择较小冗余度的特征子集。实验结果显示基于最小冗余原则方法能够改善特征选取的效果,提高文本分类的性能。
【Abstract】 In text classification tasks,these well-known feature selection methods such as information gain adopt conditional independence assumption between various features.However,this assumption would result in serious redundancy problems among various selected features.To alleviate the redundancy problem within the selected feature subset,this paper proposed a method based on minimal redundancy principle(MRP) for feature selection,in which correlations between different features are considered in feature selection process,and a feature subset with less redundancy can be built.Experimental results showed that MRP method can improve the effectiveness of feature selection,and results in better text classification performance(in most cases).
【Key words】 computer application; Chinese information processing; conditional independence assumption; minimal redundancy principle; feature selection; text classification;
- 【文献出处】 中文信息学报 ,Journal of Chinese Information Processing , 编辑部邮箱 ,2007年05期
- 【分类号】TP391.1
- 【被引频次】10
- 【下载频次】270