节点文献
中国人群乳腺癌遗传易感性研究
Genetic Susceptibility Study of Breast Cancer in Chinese Population
【作者】 陈伟;
【导师】 缪小平;
【作者基本信息】 华中科技大学 , 流行病与卫生统计学, 2015, 博士
【摘要】 研究背景:乳腺癌是女性中最常见的一种恶性肿瘤,不管是在发达国家还是在发展中国家,乳腺癌均位于女性恶性肿瘤发病率的首位。大量的流行病学研究表明,乳腺癌是一种由环境因素与遗传因素共同导致的高异质性疾病。相同环境因素暴露情况下,个体患乳腺癌的风险不尽相同,说明个体遗传因素在决定乳腺癌的易感性方面发挥重要作用。目前,乳腺癌的全基因组关联研究(Genome-wide association studies, GWAS),已经发现了很多与乳腺癌的遗传易感性相关的单核苷酸多态位点(Single-nucleotide polymorphisms, SNPs)。但是大多数的GWAS都是在欧美人群中进行,而在中国人群中的数据相对较少,同时这些GWAS发现的位点之间的交互作用如何,我们还知之甚少。研究目的:(1):分析GWAS发现位点与中国人群乳腺癌发病风险的关系。(2):分析关联分析得到的风险易感位点与绝经状况之间的环境-基因相乘交互作用。(3):分析这些GWAS发现位点之间的基因-基因交互作用对乳腺癌发病风险的影响。研究方法:本研究我们采用以医院为基础的病例-对照研究设计,利用多分析方法策略,采用包含随机森林(random forest, RF),多因子降维(multifactor dimensionality reduction, MDR)和非条件logistic回归(logistic regression, LR)的方法综合评价GWAS发现的10个位点与乳腺癌发病风险的关系,以及这些位点之间的高阶交互作用。运用TaqMan基因分型技术对GWAS发现位点进行基因型检测。分别采用χ2检验和t检验比较病例组和对照组人口学资料中的分类变量(如绝经状况)和连续性变量(如年龄等)。采用χ2拟合优度检验该位点的基因分型是否符合HWE平衡。采用logistic回归模型分析单个位点与乳腺癌发病风险的关系,应用Haploview4.2软件分析位点之间的连锁不平衡程度。研究结果:(1):本次研究共纳入477例乳腺癌病例和534例健康女性对照。年龄及绝经状况分布在病例组与对照组中比较无统计学差别。病例组中,有271例病人被诊断为雌激素受体(Estrogen Receptor, ER)阳性,占所有病例的56.8%,168例为ER阴性(35.2%);另外有241例(50.5%)患者被诊断为孕激素受体(Progesterone Receptor, PR)P日性,198例(41.5%)PR阴性,此外还有38例(8.0%)病人的ER和PR信息由于资料缺失而无法获知。(2):在此次病例-对照研究的10个GWAS发现位点中,有包含rs1219648, rs3757318, rs1926657, rs6656756, rs2046210和rs4973768在内的6个位点在进行单位点分析时被发现与中国人群乳腺癌的发病风险相关。并且在经过了错误发现率(FDR)的校正之后,rs3757318, rs2046210, rs4973768和rs1926657四个位点仍然与乳腺癌发病风险相关。(3):对单个位点分析发现的阳性位点与绝经状况的交互作用分析发现,rs3757318与绝经状况之间存在相乘交互作用(P=0.001)。(4):在采用随机森林分析中,rs3757318, rs2046210和rs4973768被认为是乳腺癌最主要的三个危险因素,同时该三个位点被选为是在考虑了位点之间交互作用的最优疾病风险子集。接下来的多因子降维分析同样发现包含rs3757318, rs2046210和rs4973768的三阶交互作用模型最好的解释了位点之间的交互作用,其检验准确性(testing accuracy, TA)和交叉验证一致性(cross-validation consistency, CVC)分别为0.6183和10/10。随后我们对这三个位点的累积效应进行分析发现随着含风险等位基因个数的增加,个体患乳腺癌的风险也在增加,呈现出等位基因-剂量效应关系(Ptrend=9.80×10-5),含有1个,2个,3个和4-6个危险等位基因的个体其患乳腺癌的风险是不携带风险等位基因个体的2-3倍,其对应的OR值分别为2.06(95%CI=1.40-3.01),2.37(95%CI=1.64-3.44),2.28(95%CI=1.48-3.52)和3.27(95%CI=1.96-5.48)。研究结论:(1):GWAS发现位点rs1926657,rs1219648,rs3757318,rs2046210,rs4973768和rs6556756与中国人群乳腺癌的发病风险相关,其中rs3757318是影响乳腺癌遗传易感性的最主要位点。(2):rs3757318和绝经状况之间存在相乘交互作用。(3):rs3757318, rs2046210和rs4973768在乳腺癌的遗传易感性中发挥着非常重要的基因-基因交互作用,当然其潜在的生物学机制需要我们更多的研究去进行探讨。创新点:(1):本研究首次研究了GWAS发现位点rs1926657,rs1978503, rs6556756以及rs2075555等位点与中国人群乳腺癌发病风险的关系(2):本研究采用联合logistic回归模型,随机森林和多因子降维,多种统计分析方法综合评价GWAS发现位点之间的基因-基因交互作用研究背景:P53是最早发现的肿瘤抑制因子之一,也是目前研究的最多、最重要的一种转录因子。作为转录因子,p53主要通过与大量的靶基因上的特异性的DNA序列相结合,并调控靶基因的转录而决定细胞的生理功能。与p53相结合的DNA序列,称为p53反应元件(p53response element, p53-RE),p53与之结合的能力以及结合后对靶基因的转录调控能力是p53发挥肿瘤抑制功能的关键所在。因此,目前一致认为,位于p53靶基因反应元件上关键碱基功能性的遗传变异可能会影响p53与之的结合,从而影响p53对靶基因表达的调控,并最终导致不同的肿瘤遗传易感性。研究目的:(1):第一部分首先分析p53结合位点rs4590952与中国人群乳腺癌遗传易感性关系。(2):第二部分研究位于乳腺组织特异性的p53结合位点的遗传变异与中国人群乳腺癌遗传易感性的关系。研究方法:首先采用大样本病例-对照关联研究,探讨位于KITLG基因上的rs4590952与中国人群乳腺癌发病风险的关系。第二部分利用公共数据库或者已发表的文章数据,采用生物信息学的方法,提取在乳腺癌细胞系中进行的ChIP-seq数据,然后利用Mach软件筛选位于p53靶基因结合区域,可能影响p53结合而影响p53功能的多态位点。接下来采用两阶段大样本病例-对照关联研究,研究通过上述方法筛选出来的p53反应元件上的3个多态位点与乳腺癌遗传易感性的关系。采用TaqMan基因分型方法对该位点进行分型。分别采用χ2检验和t检验比较病例组和对照组人口学资料中的分类变量(如绝经状况,是否吸烟饮酒等)和连续性变量(如年龄等)。采用χ2拟合优度检验该位点的基因分型是否符合HWE平衡。采用非条件logistic回归模型分析该位点与乳腺癌发病风险的关系。研究结果:(1):第一部分研究共纳入1241例乳腺癌患者和1259例健康女性对照。病例组和对照组年龄比较,差别无统计学意义(P=0.942)。同时,病例组和对照组在绝经状况以及吸烟饮酒等分布上也无差别,其P值分别为0.117,0.123和0.721。在乳腺癌病人中,有802(64.60%)例病人被诊断为ER阳性,439(35.40%)例病人被诊断为ER阴性。而对于PR,分别有712例病人和529例病人被诊断为PR阳性和PR阴性,分别占所有病人的57.40%和32.60%。在经过年龄、绝经状况和吸烟饮酒等因素校正后,与rs4590952-G等位基因比较,logistic回归分析没有发现rs4590952-A等位基因会增加中国人群乳腺癌的发病风险(OR=1.04,95%CI=0.73.1.46,P=0.839)。其他杂合子模型、突变纯和模型以及显性模型、隐性模型和加性模型均未发现该位点与乳腺癌的易感性相关。根据患者ER和PR的表达与否的分层分析发现,在ER+/PR+和ER-PR-乳腺癌亚型中仍然没有发现该位点与乳腺癌的阳性关联。(2):第二部分第一阶段研究共纳入1274例乳腺癌病例和1255例健康女性对照。病例组和对照组年龄比较无统计学差异,P值等于0.318。同时病例组和对照组在绝经状况分布以及吸烟饮酒分布上也无差别,其对应的P值分别为0.539,0.258和0.131。在第二阶段中,共纳入753例病例和1199例对照,病例组和对照组在年龄和绝经状况分布上比较均无差异(P值分别为0.397和0.507)。第一阶段在三个位点中,位于VMP1基因上的rs1295925其基因型分布的差异在病例组和对照组中具有统计学意义。在经过年龄、绝经状况以及吸烟饮酒等因素的校正之后,携带rs1295925-CT和TT基因型的个体与携带rs1295925-CC基因型的个体相比,其患乳腺癌的风险增加了32%(OR=1.32,95%CI=1.07.1.62)和41%(OR=1.41.95%CI=1.13.1.78)。在其他等位基因模型、显性模型以及加性模型中也均发现了该位点与乳腺癌发病风险相关的阳性结果。且在进行了多重检验的FDR校正之后,rs1295925仍然与乳腺癌的发病风险相关。而另外两个位于BCAS1基因上的位点则没有发现它们与乳腺癌相关的证据。第二阶段对阳性位点rs1295925的进一步验证同样得到了该位点与乳腺癌易感性相关的结果。将两阶段进行合并,仍得到该位点与乳腺癌发病风险的阳性关联。第一阶段根据病人的ER和PR表达情况进行的分层分析发现,rs1295925与ER-PR-亚组发病风险的关联比ER+/PR+亚组强。在ER-PR-亚组中,携带rs1295925-CT和TT基因型患乳腺癌的风险是rs1295925-CC基因型的1.69倍和1.79倍。而在ER+/PR+亚组中,没有发现该位点与乳腺癌具有统计学意义的关联。研究结论:(1):rs4590952这个位点与中国人群乳腺癌发病风险不相关,一方面可能KITLG基因在乳腺癌的发生过程中没有发挥我们预期的功能,;另一方面可能是该基因上的其他我们没有发现的位点与乳腺癌的发病风险相关。(2):rs4590952这个位点对肿瘤的易感性可能存在组织差异。(3):位于VMP1基因上的rs1295925可能是乳腺癌的发病危险因素创新点:(1):本研究首次在中国人群中研究位于KITLG基因上rs4590952与乳腺癌发病风险的关系。(2):将生物信息学和分子流行病学相结合,充分利用现有资源,研究p53结合区域上三个位点的突变与中国人群乳腺癌发病风险的关系。
【Abstract】 Background:Breast cancer (BC) is the most common cancer and leading cause of cancer death among women worldwide. It now represents the most commonly diagnosed cancer in both developing and developed countries. Previous epidemiology studies have shown that BC is a highly heterogeneous disease which is caused by complex environmental and inherited factors although the precise pathogenesis is still unclear. Only a fraction of individuals under the same environment could develop BC, indicating that genetic susceptibility factors may play important roles in the etiology of this cancer. Recently, the genome-wide association studies (GWAS) of BC have identified multiple single nucleotide polymorphisms (SNPs) to be associated with BC risk. However, considering the diversity genetic architecture among ethnicities, the findings from other races could not represent the truly genetic susceptibility of BC in Chinese population. Moreover, the multiple interactions among these identified polymorphisms are still not well established.Objectives:(1):To explore the association of10GWAS identified SNPs and BC risk in Chinese Han population.(2):To investigate the interaction effect between significant SNPs and menopausal status on BC susceptibility.(3):To explore the high-order interactions among the ten GWAS identified SNPs to the genetic susceptibility of BC in Chinese people. Methods:A hospital-based case-control study was conducted in Wuhan population. We utilized the multi-analytic strategy combing random forest (RF), multifactor dimensionality reduction (MDR) and logistic regression (LR) approaches to investigate the associations of the polymorphisms recently identified by GWAS and BC risk and high-order interactions among these polymorphisms to the susceptibility of BC in Chinese. The X2test and t test were employed to test the differences in distribution of demographic characteristics between case and control group where appropriate. The odds ratios (ORs) and their95%confidence intervals (CIs) which were calculated by unconditional multivariate Logistic regression when adjusted by age and menopausal status were used to estimate the effect of each polymorphism on BC risk. The linkage disequilibrium (LD) was calculated using the Haploview4.2by determining the r2value.Results:(1):A total of477histopathologically confirmed BC patients and534frequency-matched healthy controls were included in this case-control study. There were no significant difference between cases and controls for the distribution of age and menopausal status with the P value of0.639and0.419, respectively. Among the cases, there were271(56.8%) BC patients were diagnosed as Estrogen Receptor (ER) positive and168(35.2%) ER negative. Similarly,241(50.5%) cases were defined as Progesterone Receptor (PR) positive and198PR (41.5%) negative. In addition,38patients’ ER and PR status were not obtained due to the deficiency of relevant information.(2):Among the ten GWAS identified polymorphisms, six polymorphisms, rs1219648, rs3757318,rs1926657, rs6556756, rs2046210and rs4973768, were found to be significantly associated with BC risk under independent analysis. Even after false discovery rate (FDR) correction, rsl926657, rs3757318, rs2046210and rs4973768still remained significantly associated with BC risk.(3):The interactions among the six significant SNPs and menopausal status were subsequently evaluated. There was significant multiplicative interaction between rs3757318and menopausal status (P=0.001).(4):In Random Forest analysis, rs3757318, rs2046210and rs4973768were ranked as the top three important risk factors and were selected as the best set which taking interactions into consideration. Subsequently, the MDR analysis of the ten variants found that the three-factor model including rs3757318, rs2046210and rs4973768interpret the best interaction model with the maximized testing accuracy (TA) of0.6183and cross-validation consistency (CVC) of10/10. Intriguingly, cumulative effect was observed in the manner of dose-dependent with increasing numbers of risk alleles (Ptrend=9.80×10-5), and the individuals carrying1,2,3and4-6risk alleles had a2or3-fold higher risk of BC than carrying0risk alleles with the OR of2.06(95%CI=1.40-3.01),2.37(95%CI=1.64-3.44),2.28(95%CI=1.48-3.52) and3.27(95%CI=1.96-5.48).Conclusions:(1):The SNPs of rs1926657, rs1219648, rs3757318, rs2046210, rs4973768and rs6556756were significantly associated with BC risk in Chinese population. Among these polymorphisms, rs3757318was the most important susceptibility factor for BC incidence with the lowest P value.(2):There was significant multiplicative interaction between rs3757318and menopausal status.(3): Our findings emphasized the proof of principle that multiple interactions of genetic variants, including rs3757318, rs2046210and rs4973768may play important roles in the susceptibility of BC though the biological mechanisms underlying the observed associations need to be elucidated.Innovations:(1):This current study firstly investigated the associations of rs1926657, rs1978503, rs6556756and rs2075555and BC risk in Chinese population.(2):Our study utilized the multi-analytic strategy including random forest, multifactor dimensionality reduction and logistic regression approaches to investigate the high-order interactions among the ten GWAS identified SNPs. Background:P53, one of the most famous and well-studied transcription factors, has been documented to be an important tumor suppressor. As a transcription factor, p53mainly exerted its function by regulating the transcription of numerous target genes, which potentially contribute to cancer susceptibility. It directly binds to a sequence-specific DNA consensus motif which was called p53response element (p53-RE). P53’s ability to bind the p53-RE and subsequently regulate the transcription of target genes was the essence for its tumor suppressor function. Therefore, it was hypothesized that polymorphisms in key nucleotides of functional p53response elements could influence the ability of p53’transcription regulation and thus might lead to different cancer susceptibility.Objectives:(1):To investigate the association of rs4590952and BC risk in Chinese population.(2):To investigate the association of breast tissue specified variants in p53binding sites and BC risk in a Chinese women.Methods:(1) In the part I study, a hospital-based case-control study was performed to explore the association of rs4590952and BC risk in Wuhan population. In part II, the ChIP-seq database about p53binding sites in MCF-7cell lines was extracted to identify the possible variants in p53target genes. Then, the Mach software was used to find the functional variants. At last, a two-stage large sample size hospital-based case-control study was performed to investigate the association of variants in p53binding sites and BC risk in a Chinese population.The genotype of variants were determined by the TaqMan SNP Genotyping Assay (Applied Biosystems, Foster city, CA) using the7900HT Fast Real-Time PCR System. The X2test and t test were employed to test the differences in distribution of classified variable such as menopausal status, smoking status and alcohol use and quantitative variable such as age between case and control group. The odds ratios (ORs) and their95%confidence intervals (CIs) which were calculated by unconditional multivariate logistic regression when adjusted by age, menopausal status, smoking status and alcohol use were used to estimate the effect of rs4590952on breast cancer risk.Results:(1):In the part I case-control study, a total of1241BC cases and1259frequency-matched healthy controls were included. The average age between cases and controls was comparable (P-0.942). In addition, the There were no significant difference between case and control group for the distribution of menopausal status, smoking status and alcohol use with the P value of0.117,0.123and0.721, respectively. After adjusted by age, smoking status, alcohol use and menopausal status, the logistic regression analysis showed that the rs4590952-A allele was not significantly associated with BC risk, with an OR of1.04(95%CI=0.73-1.46) as compared to the G allele. No association between this variant and BC risk were observed for the heterozygote model, homozygote model, dominant model, recessive model and additive model, yet. When performed stratified analysis by the ER and PR status, no significant association of rs4590952and BC risk were found in ER+/PR+and ER-PR-subgroups.(2) In the part II of the stage1study, a total of1274BC cases and1255frequency-matched cancer-free controls were included in this case-control study. The average age was comparable between case and control group with the P value of0.318. Meanwhile, the distribution of menopausal status, smoking and alcohol use between cases and controls were similar with the P value of0.539,0.258and0.131, respectively. Then753BC cases and1199controls were included in the stage2study to further confirm the significant results. The average age and distribution of menopausal status between cases and controls were comparable with the P value of0.397and0.507. In the stage1study, the genotype distribution of rs1295925was significantly different between case and control group. The individuals carrying rs1295925-CT and rs1295925-TT genotypes were significantly associated with increased BC risk when compared with rs1295925-CC genotype after adjustment of age, menopausal status, smoking and alcohol use (OR=1.32,95%CI=1.07-1.62and OR=1.41,95%CI=1.13-1.78, respectively). Positive associations were also observed in allelic model, dominant and additive models. Furthermore, significant association between rsl295925and BC risk were still remained after FDR adjustment. No significant associations of another two SNPs in BCAS1gene and risk of BC were identified in our study. Moreover, significant association of rs1295925and BC risk was confirmed in the stage2. In addition, positive result was also observed when combined the two stage studies. When performed subgroup analysis according to the ER and PR expression of BC patients in the stage1study, significant association between rsl295925and BC risk was identified in ER-PR-subgroup but not in ER+/PR+subgroup.Conclusions:(1):The rs459052was not associated with BC risk in Chinese population. In one hand, the KITLG gene may not exert the high effect on the development of BC risk as we anticipated; in another hand, some other genetic variants which we did not identified in this gene may confer to the susceptibility of BC risk.(2):The result of our study emphasized the potentially tissue-specific role of rs4590952in cancer susceptibility.(3):The rsl295925which located in VMP1gene was associated with increased BC risk in Chinese population.Innovations:(1):This was the first study which explored the association of the functional variant, rs4590952and BC risk in Chinese population.(2):Our study combined bioinformatics and molecular epidemiology to explore the associations of three polymorphisms in p53binding site and BC risk in Chinese population.
【Key words】 breast cancer; Genome-wide association study; single-nucleotidepolymorphism; Logistic regression; random forest; multifactor dimensionalityreductionbreast cancer; p53respooonse element; KITLG; VMP1; rs4590952; rs1295915;