节点文献
GA在特征选择中的应用与设计研究
Research on application and design of GA in feature selection
【摘要】 从海量文本集中选择较优秀的特征子集是文本分类中的一个NP-难问题。而对于NP-问题,遗传算法往往能够有效地加以解决。为了克服传统遗传算法的"漂移"和"早敛"问题,首先引入了粗糙集并在此基础上详细设计了适应度函数、自适应交叉算子、自适应变异算子以及合理的终止条件。以此遗传算法为基础设计了一个特征选择算法。在复旦大学提供的语料库上进行了试验验证。实验结果表明此特征选择算法性能良好。
【Abstract】 It is a NP-question to choose more representative feature subset from massive Chinese data set in text categoriza-tion.With regard to the NP-question,genetic algorithm is often able to solve it effectively.In order to overcome "Drift" prob-lem and"Early converges"problem of traditional genetic algorithm,this article firstly introduces rough sets and designs the fitness function,adaptive crossover operator,adaptive mutation operator and reasonable termination conditions.And then a fea-ture selection algorithm is presented based on the designed genetic algorithm.Finally,the feature selection algorithm is validat-ed by means of the corpus which is provided by Fudan University.Experiment results show that the proposed feature selec-tion algorithm has good performance.
【Key words】 text categorization; feature selection; Genetic Algorithm(GA); rough set;
- 【文献出处】 计算机工程与应用 ,Computer Engineering and Applications , 编辑部邮箱 ,2010年27期
- 【分类号】TP391.1
- 【被引频次】4
- 【下载频次】114