节点文献

模式生物基因序列的识别

An Identification of the Model Species Genomes

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 陈翠霞李前忠

【Author】 CHEN Cui-xia , LI Qian-zhong(Department of Physics,College of Sciences and Technology,NeiMongol University , Hohhot 010021, PRC)

【机构】 内蒙古大学理工学院物理系内蒙古大学理工学院物理系 呼和浩特010021呼和浩特010021

【摘要】 真核生物的全基因组序列可分为三种:外显子、内含子和基因间序列.基于剪切位点附近序列的保守性,序列的组分特征和编码序列阅读框存在三周期性,三种序列的标准离散源由序列上64个三联体的概率和5′端与3′尾剪切位点附近(共30位点)上4个碱基的概率,共184个参数构成.某条序列的类型就可以由该序列的离散量与上面三个标准离散源的离散量之间的离散增量最小值决定.当标准离散源具有184个信息参数时预测率比64参数预测的成功率至少提高4.61%,前者的预测成功率依次如下:线虫88.37%,酵母菌90.72%,拟南芥91.08%,果蝇92.28%,大肠杆菌92.88%.对预测成功的和错误的两类序列进行比较,发现这些预测错误序列的184个参数值与其预测结果所属的那类序列本身的参数值十分类似.

【Abstract】 Based on the conservation of nucleotides around the splice sites,the compositional feature and the existence of reading frame with 3-periodicity in coding sequence, the complete sequences of the eukaryotes genomes can be grouped into three kinds: introns, exons and intergenic DNA.The standard sources of diversity are respectively determined by the probability of 64 trimers on the whole sequence and 4 bases at 30 positions around the splice sites. The classification of a sequence can be determined by the least increment of diversity. The results show that the higher rates of correct prediction with the densities of 64 trimers and 120 bases have been obtained from standard sets and the test sets.The rates are better than that only with 64 trimers in terms of sensitivity (Sn) and specificity (Tn). The overall rates are as follows:C.elegans 88.37%,S.cerevisiae 90.72%,A.thaliana 91.08%,D.melanogaster 92.28%,E.coli 92.88%.On the analysis of the falsely predicted sequences,it can be seen that there are some similarities between the two kinds of sequences (the positive and the false).

【基金】 国家自然科学基金项目(No.30160025)
  • 【文献出处】 内蒙古大学学报(自然科学版) ,Acta Scientiarum Naturalium Universitatis Neimongol , 编辑部邮箱 ,2005年04期
  • 【分类号】Q78
  • 【下载频次】246
节点文献中: 

本文链接的文献网络图示:

本文的引文网络