节点文献

基于生物信息学的不同病理阶段肝细胞癌患者基因表达和突变谱分析

Bioinformatics Based Analysis of Gene Expression and Mutation Profile in Patients with Different Pathological Stages of Hepatocellular Carcinoma

【作者】 张楠

【导师】 杨洁;

【作者基本信息】 天津医科大学 , 免疫学, 2021, 硕士

【摘要】 研究目的:肝细胞癌(Hepatocellular Carcinoma,HCC)作为最常见的原发性肝癌类型,因其恶性程度高而成为目前严重威胁人类健康的恶性肿瘤。目前针对HCC早期诊断的手段还不完善,绝大多数患者都是在中晚期才被确诊的,因而致使患者的预后较差。因此,研究HCC的早期致病机制并挖掘对临床诊断及预后判断具有价值的生物标志物是十分必要的。随着当前高通量测序和基因芯片技术的逐渐成熟,大量的基因表达和测序数据被收录到各大生物数据库中,因此运用生物信息学的分析方法对这些数据进行挖掘,从中获得有价值的信息正在成为医学科研中一个有力的研究手段。TCGA(The Cancer Genome Atlas,癌症和肿瘤基因图谱)数据库是目前最大的肿瘤基因信息数据库,收录了多达38种恶性肿瘤的临床、表达等信息,为肿瘤相关研究提供了数据基础。目前尚无针对不同临床病理分期或组织学分级HCC基因表达与预后相关的研究报道,因此本课题旨在根据TCGA数据库中提供的367例HCC病例,分析不同临床病理分期或组织学分级的HCC患者的基因表达和突变情况及其与预后的相关性,筛选与HCC临床病理分期/组织学分级和预后相关的基因,为研究HCC的致病机制及为临床诊断和治疗提供新的思路。研究方法:本课题主要分为五部分:第一部分:(1)对TCGA数据库进行检索,下载并整理TCGA-LIHC项目患者的临床信息和基因表达数据,分析HCC患者的组织学分级(Grading)和临床病理分期(Staging)与临床预后的相关性;(2)分析HCC患者的一般临床因素、各项生化指标与临床病理分期的相关性;(3)筛选肿瘤与正常、以及不同病理分期HCC患者之间的差异表达基因,进行蛋白质互作网络分析,同时分析这些基因的表达与患者预后的相关性。第二部分:(1)下载并整理TCGA数据库中HCC患者的拷贝数变异(Copy Number Variant,CNV)和单核苷酸变异(Single Nucleotide Variation,SNV)数据,分析HCC患者基因组CNV情况,筛选正常/肿瘤组的CNV差异基因;(2)筛选出基因表达与拷贝数显著相关的驱动基因,并进行GO和KEGG富集分析;(3)利用“MCODE”(Molecular Complex Detection,分子复合物检测)模块化分析方法对上述驱动基因进行蛋白质互作分析,筛选出2个最高额定值模块PPI网络中的Hub基因,并分析基因的表达与拷贝数变异的相关性;(4)分析HCC患者基因突变的频率,筛选出突变频率较高的基因,分析不同病理时期这些基因的表达与突变的相关性,以及基因突变与预后的相关性;(5)提取并整理HCC患者的单核苷酸多态性(Single Nucleotide Polymorphism,SNP)数据,分析不同病理分期HCC患者基因特定SNP与其表达及预后的相关性。第三部分:(1)利用TCGA数据库中HCC与正常样本的临床和基因表达数据,基于随机森林和决策树建模的方法构建HCC分类模型,判断分类效果,并筛选出关键基因,分析这些基因在HCC中的表达;(2)基于HCC患者的TNM分期信息,构建不同病理分期HCC分类模型,判断分类效果;(3)删除TNM分期信息,基于差异基因再次构建不同病理分期HCC分类模型,判断分类效果,并筛选关键基因,分析其表达与HCC患者临床病理分期的相关性。第四部分:(1)利用TCGA数据库中HCC患者的基因表达数据,进行主成分分析,判断这种方法能否有效区分不同病理分期的HCC患者,筛选对于PC1和PC2起关键作用的基因;(2)分析这些关键基因在HCC组与对照组、以及不同病理分期HCC患者中的表达水平。第五部分:(1)通过前4个部分的分析筛选出一些兴趣基因,利用肝癌组织cDNA芯片进行qPCR实验,检测这些基因在HCC组和对照组中的表达是否存在差异,并分析基因表达与临床病理分期的相关性:(2)通过qPCR实验筛选出HCC中的差异表达基因,分析其表达与预后的相关性;(3)进一步筛选出相关基因,利用肝癌组织芯片检测目的蛋白在HCC和癌旁组织中的表达。实验结果:第一部分:(1)通过相关性分析发现,HCC患者的临床病理分期与预后具有显著的相关性,而组织学分级与预后无显著的相关性;(2)通过对患者的一般临床因素、各项生化指标与临床病理分期的相关性分析发现,总胆红素、白蛋白、甲胎蛋白和血小板计数与HCC的临床病理分期具有显著的相关性;(3)通过对肿瘤与正常、Ⅱ期与Ⅰ期、Ⅲ+Ⅳ期与Ⅱ期分组进行差异表达基因分析,筛选出12个共同的上调基因,其中DUOX2、HOXB9、IQCA1、KCNH2、NPTX1、PCSK1基因的表达与HCC临床病理分期具有显著相关性,CUZD1和IQCA1基因与患者预后具有显著相关性。第二部分:(1)通过筛选HCC患者的CNV驱动基因发现,这些基因大多参与细胞周期或细胞分裂过程,不同临床病理分期患者CCNE2和GADD45G基因的表达与其拷贝数显著相关;(2)通过“MCODE”模块化分析发现HCC中与细胞周期相关的CNV驱动基因表达升高,且这种高表达与该基因的拷贝数呈显著的正相关;(3)通过突变分析筛选出突变频率较高的基因,但这些基因的突变频率与临床病理分期无显著相关性,不同病理分期HCC患者CTNNB1、TP53、OBSCN、PCLO基因的表达与其突变有关,但基因突变与预后无关;(4)不同病理分期HCC患者CTNNB1、TP53基因的表达与其特定SNP有关。第三部分:(1)利用基因表达数据进行随机森林和决策树建模,区分HCC与非HCC病例的效果较好,筛选出关键基因,其中ECM1、FCN2、ANGPTL6、OIT3和ADAMTS13基因在HCC患者中表达下调,LRRC14基因表达上调;(2)利用TNM分期进行建模,可有效区分不同病理阶段的HCC病例,删除TNM分期后,利用差异基因进行分类效果较差,但由此也筛选出一些关键基因,其中FAM99A和GNA14基因在HCC中表达下调,而GAS2L3、CEP55、SEMA3F和PRR11基因表达上调,并且这些基因的表达与HCC的病理分期密切相关。第四部分:(1)基于HCC基因表达数据进行主成分分析,根据PC1和PC2可以有效区分HCC与非HCC病例,但无法区分不同病理分期HCC病例;(2)对于PC1和PC2起关键作用的前10个基因中,SLC27A5、ALDH2和DCXR基因在HCC患者中表达下调,LAMTOR4、SNRPA和SNRPD2基因表达上调,此外,SLC27A5、ADAM17、SNRPA、SNRPD2和ALDH2基因的表达与HCC的临床病理分期显著相关。第五部分:(1)根据肝癌组织cDNA芯片qPCR检测结果,GAS2L3、SNRPA和SNRPD2基因在HCC患者中表达显著升高,并且GAS2L3基因与HCC患者的临床病理分期显著相关;(2)GAS2L3、SNRPA和SNRPD2基因的表达与HCC患者的预后呈显著的负相关,高表达的患者预后较差;(3)肝癌组织芯片免疫组化结果表明,GAS2L3蛋白在肝癌组织中的表达水平显著高于癌旁对照组织,且随临床病理分期的升高表达增加。结论:基于TCGA数据库中的367例HCC病例数据,进行基因表达及预后分析、CNV及突变分析、随机森林和决策树建模和主成分分析,以鉴定与不同病理分期(Ⅰ、Ⅱ、Ⅲ-Ⅳ期)及预后相关的差异表达基因。经过一系列分析,筛选出兴趣基因,然后利用肝癌组织芯片在mRNA和蛋白水平检测了基因的表达,发现HCC患者中GAS2L3基因在mRNA和蛋白水平的表达均显著升高,并且基因的表达与HCC患者的临床病理分期和预后都显著相关,可将其作为潜在的HCC生物标志物进行研究。

【Abstract】 Objectives:Hepatocellular Carcinoma(HCC),as the most common type of primary liver cancer,has become a serious threat to human health due to its high degree of malignancy.At present,the methods for early diagnosis of HCC are not perfect.The vast majority of patients are diagnosed in the middle and late stages,which results in a poor prognosis for patients.Therefore,it is very necessary to study the early pathogenesis of HCC and explore biomarkers with value for clinical diagnosis and prognosis.With the gradual maturity of high-throughput sequencing and gene chip technology,a large number of gene expression and sequencing data have been included in various biological databases.Therefore,using bioinformatics analysis method to mine these data and obtain valuable information is becoming a powerful research means in medical scientific research.The Cancer Genome Atlas(TCGA)database is currently the largest database of tumor gene information.It contains information on the clinical and expression of as many as 38 malignant tumors,providing a data basis for tumor-related research.At present,there is no research report on the relationship between HCC gene expression and prognosis in different clinicopathological stages or histological grades.Therefore,this project aims to analyze HCC patients of different clinicopathological stages or histological grades based on the 367 HCC cases provided in the TCGA database.The gene expression and mutation status of HCC and its correlation with prognosis,and the screening of genes related to the clinicopathological staging/histological grading and prognosis of HCC provide new ideas for studying the pathogenic mechanism of HCC and for clinical diagnosis and treatment.Methods:This study was divided into five parts.Part 1:(1)Search the TCGA database,download and sort the clinical and mRNA and lnc RNA expression data of TCGA-LIHC project patients,and analyze the correlation between HCC patients’ histological grade(Grading)and clinical pathological staging(Staging)and clinical prognosis;(2)Analyze the correlation between general clinical factors,various biochemical indicators and clinicopathological stages of HCC patients;(3)Screen the differentially expressed genes between tumors and normal and HCC patients with different pathological stages,perform protein interaction network analysis,and analyze the expression of these genes and their correlation with the patient’s prognosis.Part 2:(1)Download and collate the Copy Number Variant(CNV)and Single Nucleotide Variation(SNV)data of HCC patients from TCGA database,analyze the CNV situation of HCC patients’ genomes,and screen the CNV differential genes of normal/tumor groups;(2)The driver genes whose gene expression was significantly correlated with copy number were screened and analyzed by GO and KEGG enrichment;(3)Use the "MCODE"(Molecular Complex Detection)modular analysis method to analyze the protein interaction of the above-mentioned driver genes,screen out the Hub genes in the PPI network of the 2 highest rated modules,and analyze the correlation between gene expression and copy numbers;(4)The frequency of gene mutation in HCC patients was analyzed to screen out genes with higher mutation frequency,and the correlation between the expression of these genes and mutation in different pathological stages,as well as the correlation between gene mutation and prognosis was analyzed;(5)Single Nucleotide Polymorphism(SNP)data of HCC patients were extracted and collated to analyze the correlation between gene specific SNP expression and prognosis of HCC patients at different pathological stages.Part 3:(1)Using clinical and gene expression data of HCC and normal samples in TCGA database,HCC classification model was constructed based on random forest and decision tree modeling method,classification effect was judged,key genes were screened out,and the expression of these genes in HCC was analyzed;(2)Based on the TNM staging information of HCC patients,construct HCC classification models of different pathological stages and judge the classification effect;(3)TNM staging information was deleted,and HCC classification models with different pathological stages were constructed again based on differential genes to determine the classification effect,and key genes were screened to analyze the correlation between their expression and clinicopathological stages of HCC patients.Part 4:(1)Use the gene expression data of HCC patients in the TCGA database to perform principal component analysis to determine whether this method can effectively distinguish HCC patients with different pathological stages,and screen for genes that play a key role in PC1 and PC2;(2)Analyze the expression levels of key genes in HCC and control group,as well as in HCC patients with different pathological stages.Part 5:(1)Through the analysis of the first 4 parts,some genes of interest were screened out,and qPCR experiments were performed using liver cancer tissue cDNA chips to detect whether there were differences in the expression of these genes in the HCC group and the control group,and analyze the relationship between gene expression and clinicopathological staging;(2)The differentially expressed genes in HCC were screened by qPCR assay,and the correlation between their expression and prognosis was analyzed;(3)The differential genes whose gene expression is significantly related to prognosis were screened out,and the expression of genes in HCC and para-cancerous tissues was detected using liver cancer tissue microarray.Results:Part 1:(1)Correlation analysis showed that clinicopathological stage was significantly associated with prognosis in HCC patients,while histological grade was not significantly associated with prognosis;(2)By analyzing the correlation between general clinical factors,biochemical indexes and clinicopathological stage,it was found that total bilirubin,albumin,alpha fetoprotein and platelet count were significantly correlated with clinicopathological stage of HCC;(3)By analyzing the differentially expressed genes of tumor and normal,stage Ⅱ and stage Ⅰ,and stageⅢ+Ⅳ and stage Ⅱ,12 common up-regulated genes were screened,among which DUOX2,HOXB9,IQCA1,KCNH2,NPTX1,PCSK1 gene expression was significantly associated with HCC clinical pathologic staging,CUZD1 and IQCA1 gene and the prognosis of patients with significant correlation.Part 2:(1)By screening the CNV driver genes in HCC patients,it was found that most of these genes were involved in the cell cycle or cell division process.The expression of CCNE2 and GADD45 G genes was significantly correlated with their copy number in patients with different clinicopathological stages;(2)Through "MCODE" modularized analysis,it was found that the expression of CNV driver genes related to cell cycle in HCC was increased,and this high expression was significantly positively correlated with the copy numbers of this gene;(3)Through mutation analysis,genes with high mutation frequency were screened out,but the mutation frequency of these genes was not significantly correlated with clinicopathological stage.The expression of CTNNB1,TP53,OBSCN and PCLO genes in HCC patients with different pathological stages was correlated with their mutation,but the gene mutation was not correlated with prognosis;(4)The expressions of CTNNB1 and TP53 genes in HCC patients with different pathological stages were related to their specific SNPs.Part 3:(1)Using the gene expression data for random forest and decision tree modeling,it was effective to distinguish HCC and non-HCC cases,and the key genes were screened out.Among them,the expression of ECM1,FCN2,ANGPTL6,OIT3 and ADAMTS13 genes were down-regulated in HCC patients,while the expression of LRRC14 gene was up-regulated;(2)The use of TNM staging for modeling can effectively distinguish HCC cases at different pathological stages.After deleting TNM staging,the effect of using differential genes for classification is poor,but some key genes have also been screened out.FAM99 A and GNA14 genes are down-regulated in HCC,while GAS2L3,CEP55,SEMA3 F and PRR11 genes are up-regulated,and the expression of these genes is closely related to the pathological stage of HCC.Part 4:(1)Principal component analysis based on HCC gene expression data showed that HCC and non-HCC cases could be effectively distinguished according to PC1 and PC2,but HCC cases with different pathological stages could not be distinguished;(2)Among the top 10 genes that play a key role in PC1 and PC2,SLC27A5,ALDH2 and DCXR genes were down-regulated in HCC patients,while LAMTOR4,SNRPA and SNRPD2 genes were up-regulated.In addition,the expression of SLC27A5,ADAM17,SNRPA,SNRPD2 and ALDH2 genes was significantly correlated with the clinicopathological stage of HCC.Part 5:(1)The cDNA chip qPCR results of liver cancer tissue showed that the expression of GAS2L3,SNRPA and SNRPD2 genes were increased in HCC patients,and GAS2L3 gene was significantly correlated with clinicopathological stage of HCC patients;(2)The expression of GAS2L3,SNRPA and SNRPD2 genes was significantly negatively correlated with the prognosis of HCC patients,and patients with high expression had poor prognosis;(3)The microarray immunohistochemical results of liver cancer tissues showed that the expression level of GAS2L3 protein in liver cancer tissues was significantly higher than that in adjacent control tissues,and the expression increased with the increase of clinicopathological grade.Conclusion:Gene expression and prognosis analysis,CNV and mutation analysis,random forest and decision tree modeling,and principal component analysis were performed based on 367 cases of HCC in the TCGA database to identify differentially expressed genes associated with different pathological stages(Ⅰ,Ⅱ,Ⅲ-Ⅳ)and prognosis.After a series of analyses,interest genes were screened out,and then the expression of genes was detected at mRNA and protein levels using liver cancer tissue microarray.It was found that the expression of GAS2L3 gene in both mRNA and protein levels was significantly increased in HCC patients,and the gene expression was significantly correlated with clinicopathological stage and prognosis of HCC patients.It can be studied as a potential biomarker for HCC.

【关键词】 HCCTCGA病理分期基因表达基因突变
【Key words】 HCCTCGApathological stagegene expressiongenetic mutation
  • 【分类号】R735.7;Q811.4
节点文献中: 

本文链接的文献网络图示:

本文的引文网络