节点文献

哺乳动物基因组插入、缺失的模式研究

Pattern of Insertion and Deletion in Mammalian Genome

【作者】 王文娟

【导师】 陶士珩; 袁志发;

【作者基本信息】 西北农林科技大学 , 应用数学, 2008, 硕士

【摘要】 多种哺乳动物基因组数据,尤其是人类基因组数据的完成,标志着当代生物学进入了后基因组时代。在基因组的以千兆为计量单位的序列中,蕴藏着丰富且未知的信息。将其中大量的信息提取并加以分析,成为后基因组时代挑战当代生物学家的一大难关。基因组研究的一个主要目的是研究物种或个体基因组间的差异,这些差异又是由遗传变异造成的。一种改变遗传变异的机制是突变。突变有很多种形式,遗传密码中一个位点的碱基变成另一个碱基的突变称为点突变;一个基因中一段DNA的插入或缺失分别是一个插入或缺失;另外,基因或部分基因可能倒位。插入、缺失及替换是进化过程中三种主要的突变形式。研究表明,替换发生的频率是插入、缺失的数倍,然而在基因组进化中插入、缺失起主要作用。因此,研究插入、缺失的模式对于研究哺乳动物基因组的进化机制非常有必要。序列比对是研究基因组间差异及物种进化等方面的强有力的工具。进行序列比对的核心是得分系统,包括得分矩阵和罚分方式,主要由插入、缺失及替换的分布推导演变而来。本研究的主要目的是通过研究插入、缺失的模式来获取插入、缺失与替换一体的,具有坚定进化基础的对位的得分系统,从而为获得正确的序列对位坚实的理论基础。本文从UCSC数据库下载28个脊椎动物基因组的多重序列比对结果,并编程挑选其中的15个哺乳动物基因组,统计插入、缺失的数目,计算其发生频率,进而分析其分布特征。主要获得如下结果: (1)单个核苷酸的插入、缺失发生频率最高,分别占总插入的28.63%到41.78%,总缺失的26.54%到45.93%;(2)研究14个物种的长度不超过10bp的插入缺失表明,除了Opossum,插入较缺失发生频繁,缺失插入总个数比为0.85:1;其他物种的缺失都比插入发生更频繁,缺失插入总个数比从1.21变到1.87。本研究所用插入、缺失数据是先前此类研究的很多倍,所得结果却基本一致。因此,高频率的单个核苷酸的插入、缺失及缺失比插入发生频率似乎是基因组进化中的一种普遍现象;(3)插入、缺失的发生频率随着其长度的增加而减少;(4)与幂函数相比,伽玛分布的拟合效果更好,更能精确地描述插入、缺失的长度分布。

【Abstract】 The completion of the most mammalian genomes,especially the human genome, marking the contemporary biology into the post-genome era .The sequences be measured by Gigabit of so many genomes contains rich an unknown information.,the challenge of the contemporary biologists is to extract and analysis these information. One major aim of genomics research is to identify differences between genomes of species or individuals. The differences of genomes require genetic variation. One mechanism that increases genetic variation is mutation. There are many kinds of mutations. A mutation in which one“letter”of the genetic code is changed to another is a point mutation. Lengths of DNA be deleted or inserted in a gene means a deletion or insertion, respectively.Finally, genes or parts of genes can become inverted or duplicated. Nucleotide substitutions, together with insertions and deletions (indels) are the primary types of mutations which are the major driving forces for genome evolution. Although there are more substitutions than indels, previous researches suggested that indels, instead of substitutions, comprise the majority of the genomic divergence . Therefore, the study of the patterns of insertion and deletion is necessary to understand the mammalian evolution.Sequence alignment is a powerful tool to study the differences between genomes and the evolution of species.The core of sequence alignment is the scoring system,including the scoring matrices and the gap penalties,is mainly derived from the distribution of insertion,deletion and substiturion.The purpose of this study is to obtain a scoring system which have solid evolution basis and base on insertion,deletion and substitution,so as to do a correct sequence alignment.In this research, The multiple alignments of 28 vertebrate species were downloaded from UCSC Genome Bioinformatics website. Olny the multiple alignments of 14 mammalian genomes were used to analyze the patterns of insertions and deletions. We calculated the number and the frequencies of insertions and deletions, Then we studied their length distributions. In this paper, following results were obtained: (1) The single nucleotide insertion and deletion are the most frequent in all events, The percentage of single nucleotide insertions varies from 28.63% to 41.78%, and the percentage of single nucleotide deletions varies from 26.54% to 45.93%.(2)The research of insertions and deletions of gap length no more than 10bp of 14 mammalian genomes show that , Deletions occur more frequently than insertions over all gap lengths.The ratio of deletions to insertions varies from 1.21 to 1.87.However,in opossum,insertions occur more frequently than deletions. The ratio of deletions to insertions is 0.85:1.The data of insertions and deletions used in this study was many times of previous study,howerer the results are basically the same.Therefore, the high percent of single nucleotide insertion and deletion and deletions occur more frequently than insertions seems to be a common phenomenon in the genomic evolution. (3) Both the number of insertions and deletions decrease rapidly with the increases of gap length. (4) Compared with the power low,the probability of insertions and deletions, as a function of gap length, fits the gamma distribution very well.

【关键词】 哺乳动物基因组插入缺失伽玛函数
【Key words】 Mammalian GenomeInsertionDeletionGamma Distribution
  • 【分类号】Q78
  • 【被引频次】1
  • 【下载频次】107
节点文献中: 

本文链接的文献网络图示:

本文的引文网络