节点文献

用支持向量机预测人类基因5′/3′选择性剪切位点

Using Support Vector Machine to Predict Alternative 5’/3’ Splicing Sites of Human Genome

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 杨乌日吐李前忠刘利樊国梁

【Author】 YANG Wu-ri-tu LI Qian-zhong LIU li FAN Guo-liang (Department of Physics,College of Sciences and Technology,the Inner Mongolia University,Huhehot,010021,the Inner Mongolia Autonomous Region,China)

【机构】 内蒙古大学理工学院物理系内蒙古大学理工学院物理系 呼和浩特 010021呼和浩特 010021

【摘要】 选择性剪切是调解基因表达的重要机制。识别选择性剪切位点是后基因组时代的一个重要工作。本文从最新的EBI人类基因选择性剪切数据库中,选取5′/3′选择性剪切位点作为正集,选取在剪切位点附近的假剪切位点作为负集,并把所有的选择性剪切位点和假剪切位点随机分成训练集和测试集。本文选用的预测选择性剪切位点的方法是基于位置权重矩阵和离散增量的支持向量机方法。此方法仅基于训练集,以不同位点的单碱基概率和序列片断的三联体频数作为信息参数,利用位置权重矩阵和离散增量算法结合支持向量机,得到了选择性供体位点和受体位点的分类器,并用此分类器对测试集中的选择性供体位点和受体位点进行预测。对独立测试集中的选择性供体位点和选择性受体位点的预测成功率分别为88.74%和90.86%,特异性分别为85.62%和81.19%。本文预测选择性剪切位点的方法成功率高于其它选择性剪切位点预测方法预测成功率,此预测方法进一步提高了对选择性剪切位点的理论预测能力。

【Abstract】 Alternative splicing,which makes the same DNA sequence to product more than one protein sequences,plays an important role in regulating gene expression.Recognition of alternative splicing sites is one of the most important work in postgenome era.In this paper,the altemative 5′/3′splicing sites(alternative donor/acceptor sites)obtained from the latest human al- ternative splicing database of EBI were selected as the positive set,and the pseudo splicing sites and flanking splicing sites were se- lected as the negative set.The pseudo donor site and pseudo acceptor site were meant the GT/AG sites of DNA sequence,in which ,splicing action did not happen at anytime.All alternative splicing sites and pseudo splicing sites were randomly divided into two independent parts:training set and testing set.The training set included 723 alternative donor sites,1060 alternative acceptor sites, 727 pseudo donor sites and 755 pseudo acceptor sites;the testing set included 2894 alternative donor sites,4244 alternative acceptor sites,38284 pseudo donor sites and 29458 pseudo acceptor sites.In this paper,a new method based on support vector machine method combined with position weight matrix and increment of diversity was introduced to predict alternative splicing sites.Train- ing set’s mononucleotide frequencies of different sites were selected as position weight matrix’s parameters and sequence fraction’s 3-met frequencies of training set were selected as parameters of diversity source,receiving the scoring faction and increment of di- versity which were the support vector machine’s parameters.The alternative donor sites and alternative acceptor sites in the inde- pendent testing set were predicted by the support vector machine classifier which was made up of the support vector machine method and position weight matrix and increment of diversity.The predictive results showed that the accurades of prediction were 88.74% and 90.86%,respectively for alternative donor sites and alternative acceptor sites in the independent testing set.The predic- tive specificities were 85.62% and 81.19%,respectively.By comparing with other predictive results,the accuracies obtained by our method were higher.Our method further improved the ability of theoretically predicting alternative splicing sites.

【基金】 国家自然科学基金(30560039);高等学校博士学科点专项科研基金;内蒙古自然科学基金的资助。
  • 【文献出处】 现代生物医学进展 ,Progress in Modern Biomedicine , 编辑部邮箱 ,2007年05期
  • 【分类号】Q987
  • 【被引频次】5
  • 【下载频次】221
节点文献中: 

本文链接的文献网络图示:

本文的引文网络