节点文献

两种转录组建库方法的比较研究

A Comparative Study of Two Methods of Transcription Library Construction

【作者】 张东

【导师】 潘力; 章文蔚;

【作者基本信息】 华南理工大学 , 工程硕士(专业学位), 2015, 硕士

【摘要】 随着转录组学,蛋白组学,代谢组学等组学的不断涌现,生物学研究已经跨入后基因组时代,转录组学作为一个率先发展起来的技术开始在生物学前沿研究中得到了广泛的应用。生物遗传的中心法则告诉我们,对于大多数生物来说,遗传信息的传递是从DNA流向RNA,RNA流向蛋白质。遗传信息由DNA转换到RNA的过程称为转录。了解转录组是解读基因组功能组件和揭示细胞及组织中分子组成所必需的,揭示特定生物学过程以及疾病发生过程中的分子机理。测序技术的不断发展,使我们能够对转录组开展更为深入的测序工作,能够发现更多、更可靠、更新的转录本;测序平台更新迅速,要求我们须及时更新转录组研究方法,迎合好测序平台的升级换代。本研究选取了UHRR(Universal Human Reference RNA)作为实验材料,分别用了两种转录组文库构建方法:短插入片段法(Short Insert Fragment Method,SIFM)和长插入片段法(Long Insert Fragment Method,LIFM),构建出了两个插入片段长度为160bp(TUHRR90)和300bp(NUHRR150)的Hiseq上机文库。然后TUHRR90采用Illumina Hiseq2000测序仪以PE90测序策测序,NUHRR150采用Illumina Hiseq4000测序仪以PE150测序策略进行测序,总共获得16Gb的数据。从UHRR供应商网站下载了1000个基因的QPCR定量结果作为测序基因定量分析的参考标准。使用SOAPnuke软件对300bp(NUHRR150)文库数据NUHRR150进行修剪,截取成PE90的数据格式形成一个虚拟的文库NUHRR90。用华大基因的转录组标准信息分析软件RNA_RNAref_version5.0_beta对TUHRR90、NUHRR90、NUHRR150这三个文库数据进行多重比较分析。比较项目有测序质量、数据产量、随机性分布、可变剪切、发现junction数、基因定量一致性、基因定量与标准定量的一致性。分析结果表明高通量测序平台Illumina Hiseq4000测序仪,其测序读长、准确度都可得保证,可以满足转录组学的分析需要。长插入片段(300bp左右)的转录组文库,能够在Illumina Hiseq4000测序仪平台上,正常进行PE150测序,数据产率、质量能满足转录组分析要求;长片段配合长读长,在生物信息学分析上能发现更多的可变剪接、junction、基因融合事件,有利于转录组结构方面的研究;基因定量准确性与短片段文库非常接近。最后,长片段转录组文库可以充分利Hiseq4000的高通量、长读长的性能,在PE150测序策略下能代替原文库用于转录组学研究。

【Abstract】 As transcriptomic,proteomics,and metabolomics constantly emerging,biology study has entered the post-genome era.The transcriptome sequencing developed as one of the first popular technology in post-genome era study has been widely used.The Central Dogma tells us that for the most organisms,the transmission of genetic information is from DNA to RNA and RNA into protein.The process of genetic information transmission from DNA to RNA is called transcription.Transcriptome analysis is vital for understanding genome function components and molecules in cells and tissues.It can also reveal the mechanism in the process of biological processes and diseases.With the development of sequencing technologies,we are now able to carry out a more in-depth on the transcriptome sequencing,so that we can find more reliable and novel transcripts.Sequencing platform has been updated quickly as well,which requires us to update the transcriptome research methods to cater to the good sequencing platform..This study selected the UHRR(Universal Hhuman Reference RNA)as experiment material.We used two methods of the transcriptome library preparation protocols(Short Insert Fragment Method,Long Insert Fragment Method)and constructed two kinds of Hiseq sequencing libraries with insert fragment length 160bp(TUHRR90)and 300bp(NUHRR150)separately.Then TUHRR90 was sequenced on Illumina Hiseq2000 sequencer using PE90 sequencing strategy.Meanwhile,NUHRR150 was run on Illumina Hiseq4000 sequencer to generated PE150 reads.A total of 16 Gb of data was obtained.QPCR quantitative results of the 1000 genes of UHRR were downloaded from supplier Website as the reference data for gene expression quantification.The data of the 300bp(NUHRR150)library was trimmed using SOAPnuke software to simulate another set of NUHRR150 data with PE90 reads.Then BGI transcriptome standard information analysis pipeline RNA_RNAref_version5.0_beta was employed to analyze TUHRR90,NUHRR90 and NUHRR150.The analysis results were compared among those data sets,such as quality of sequencing,data output,random distribution,Alternative splicing,junction discovery.And we further evaluated the gene quantitative consistency among three methods and finally compared the gene expression results to those obtained by qPCR.Analysis results showed that high throughput sequencing platform Illumina Hiseq4000 sequencing machine can provide long enough,accurate reads and can meet the needs of the analysis of the transcriptome study.Long insert fragments of the transcriptome library(300bp)generated with our modified method can be run on Illumina Hiseq4000 sequencing machine platform to obtain PE150 reads.And data production rate,quality can meet the requirements of the transcriptome analysis.Longer insert size combined with longer reads suggested the advantages in finding more alternative splicing events,junction,and gene fusion detection.The gene quantification accuracy is similar among the three data sets.Long insert transcriptome library sequence on Hiseq4000 to achieve PE150 will improve the gene structure variation detection.Thus it may replace the standard protocol using right now.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络