节点文献
用于木马流量检测的集成分类模型
Research on Ensemble Classification Model of Trojan Traffic Detection
【摘要】 针对传统集成学习方法运用到木马流量检测中存在对训练样本要求较高、分类精度难以提升、泛化能力差等问题,提出了一种木马流量检测集成分类模型。对木马通信和正常通信反映在流量统计特征上的差别进行区分,提取行为统计特征构建训练集。通过引入均值化的方法对旋转森林算法中的主成分变换进行改进,并采用改进后的旋转森林算法对原始训练样本进行旋转处理,选取朴素贝叶斯、C4.5决策树和支持向量机3种差异性较大的分类算法构建基分类器,采用基于实例动态选择的加权投票策略实现集成并产生木马流量检测规则。实验结果表明:该模型充分利用了不同训练集之间的差异性以及异构分类器之间的互补性,在误报率不超过4.21%时检测率达到了96.30%,提高了木马流量检测的准确度和泛化能力。
【Abstract】 An ensemble classification model for Trojan traffic detection is proposed to solve the problem that traditional ensemble learning methods overly depend on training samples,have low classification precision and poor generalization ability when they are applied to the detection of Trojan traffic. Traffic statistics features between Trojan communication and normal communication are distinguished and then extracted to build training sets.Equalization method is introduced to improve the principal component transform of rotation forest algorithm,and the updated rotation forest algorithm is used to rotate original training samples.Then,base classifiers are constructed by using three classification algorithms:Naive Bayes,C4.5decision tree and Support Vector Machine.Integration is realized and the Trojan traffic detection rules are eventually established by using a weighted voting strategy based on the dynamic choice of instance.Experimental results show that the model makes full use of the diversity of different training sets and the complementarity of heterogeneous classifiers,and that a 96.30% detection rate is reached while the false positive rate is not higher than 4.21%,that is,both the accuracy and the generalization ability of Trojan traffic detection are improved.
【Key words】 Trojan traffic; ensemble learning; rotation forest; heterogeneous classifier; weighted voting;
- 【文献出处】 西安交通大学学报 ,Journal of Xi’an Jiaotong University , 编辑部邮箱 ,2015年08期
- 【分类号】TP393.08
- 【被引频次】8
- 【下载频次】190