节点文献

在细胞重编程过程中阶段特异性基因簇的预测研究

Prediction of Stage-specific Gene Clusters in Cell Reprogramming

【作者】 塔娜

【导师】 刘帅;

【作者基本信息】 内蒙古大学 , 计算机科学与技术, 2020, 硕士

【摘要】 细胞重编程是在特定诱导条件下,将分化的成熟体细胞重编程为多能性甚至全能性细胞的过程。了解细胞重编程过程中不同阶段基因表达的特异性变化,对于阐明诱导性多能干细胞(induced pluripotent stem cells,iPSCs)的重编程机制、提高其诱导效率具有重要作用。目前的生物实验已通过对多能性转录因子的筛选、组合及过表达,成功将成熟体细胞诱导分化为iPSCs。但是,现有研究未见从转录因子的结合峰及组蛋白修饰等其它因素的角度筛选阶段特异性基因簇的研究。因此,本文基于转录因子的结合峰及结合组蛋白修饰等其他因素建立理论预测模型对重编程阶段特异性基因簇进行预测。本文的研究内容共包括三部分:(1)基于转录因子(transcription factor,TF)结合谱特征的阶段特异性基因簇的预测。(2)基于组蛋白修饰(histone modification,HMs)特征的阶段特异性基因簇的预测。(3)基于TF结合联合HMs特征的阶段特异性基因簇的预测。本文的ChIP-seq数据和微阵列转录组数据均来自Gene Expression Omnibus(GEO)数据库,登录号分别为GSE67520和GSE67462。ChIP-seq数据包含小鼠成纤维细胞(MEF)进入iPSCs的9个重编程时间点。首先通过基因差异表达分析确定了阶段特异性基因簇。然后统计了TF Oct4、三种HMs(H3k4me3,H3k27me3,H3k27ac)及RNA聚合酶(RNApol)在这些阶段特异性基因簇的启动子,增强子及增强子细分区域上的结合峰peak个数。本文分别建立了关于TF Oct4、HMs及Oct4联合HMs与阶段性特异性基因簇之间的理论预测模型。最后,利用评价指标精准率(Precision)、召回率(Recall)和F1值(F1-score)、接受者操作特征曲线下的面积(Roc area)和准确率(accuracy)对分类器的性能进行了评价。我们的结果表明利用多组学Chip-seq数据结合机器学习技术可以有效提高细胞重编程过程中阶段特异性基因簇的预测。

【Abstract】 Cell reprogramming is a process of reprogramming differentiated somatic cells into pluripotency or even totipotency cells under specific induction conditions.Understanding the specific changes of gene expression at different stages of cell reprogramming plays an important role in elucidating the reprogramming mechanism of induced pluripotent stem cells(iPSCs)and improving their induction efficiency.Currently,biological experiments have been successfully induced to differentiate into induced pluripotent stem cells by screening,combining,and over-expressing pluripotent transcription factors.However,there is no research on stage specific gene clusters from the perspective of binding peaks of transcription factors and histone modification.Therefore,based on the binding peak of transcription factors and other factors such as histone modification,this paper established a theoretical prediction model to predict the specific gene cluster in reprogramming stage.The research content of this paper includes three parts:(1)Prediction of stage-specific gene clusters based on transcription factor(TF).(2)Prediction of stage-specific gene clusters based on histone modification(HMs).(3)Prediction of stage-specific gene clusters based on TF combined with HMS.First of all,the Chip-seq data and microarray transcriptome data in this paper are from the Gene Expression Omnibus(GEO)database,and the login numbers are GSE67520 and GSE67462,respectively.The ChIP-seq data contains nine reprogramming time points from mouse fibroblasts into iPSCs.Using differential gene expression,we identified stage-specific gene clusters.Further,the peaks number of TF Oct4,three HMs(H3k4me3,H3k27me3,H3k27ac)and RNA polymerase(RNApol)in the promoters,enhancers and enhancer subdivided regions of these phase-specific gene clusters were counted.Secondly,the theoretical prediction models of TF Oct4、HMS 、Oct4 combined with HMS and stage specific gene clusters were established respectively.Finally,the performance of the classifier is evaluated by precision、recall、f1-score、the area under the receiver operating characteristic curve(Roc area)and the accuracy rate.Our results show that the prediction of stage-specific gene clusters in cell reprogramming can be effectively improved by using multi omics Chip-seq data and deep learning technology.

  • 【网络出版投稿人】 内蒙古大学
  • 【网络出版年期】2021年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络