节点文献
基于加权PSSM直方图和随机森林集成的蛋白质交互作用位点预测
Protein-protein interaction sites prediction based on weighted PSSM histogram and random forests ensemble
【摘要】 为了提高蛋白质相互作用位点预测的精度,该文基于蛋白质位置特异性得分矩阵(Position specific scoring matrix,PSSM),提出了一种新的加权得分矩阵直方图特征表示方法;针对训练数据的极端不平衡,结合下采样和分类器集成方法,训练随机森林集成分类器。相对于传统的特征,该文所提新特征具有更低的维数,同时拥有更好的鉴别性。分类器集成则缓解了下采样带来的信息丢失,并提高了分类精度。实验结果验证了所述方法是有效的,在标准数据集上的结果优于其他最新的蛋白质相互作用位点预测方法。
【Abstract】 In order to improve the accuracy of protein-protein interaction sites prediction,based on position specific scoring matrix( PSSM) of a protein,this paper develops a novel feature representation-weighted PSSM histogram. In view of the extreme imbalance in training data,combining undersampling and classifier ensemble,a random forests ensemble classifier is trained. Compared with the traditional features,the features here possess a lower dimension reserving better discrimination. Classifier ensemble remits the damage of under-sampling and improves the performance. Experimental results show that the method here is effective and outperforms the state of the art methods on benchmark datasets.
【Key words】 protein-protein interactions; position specific scoring matrix; weighted position specific scoring matrix histogram; random forests; classifier ensemble;
- 【文献出处】 南京理工大学学报 ,Journal of Nanjing University of Science and Technology , 编辑部邮箱 ,2015年04期
- 【分类号】TP391.41
- 【被引频次】12
- 【下载频次】232