节点文献
隐Markov模型在生物信息中的应用及其算法的改进
【作者】 陈晨;
【导师】 武坤;
【作者基本信息】 中南大学 , 概率论与数理统计, 2011, 硕士
【摘要】 隐马尔可夫模型(Hidden Markov Model,简称HMM)是20世纪70年代提出来的一种统计方法。它包括一个不被直接观测的马尔可夫过程和一个与之相关的可观测过程。隐马氏模型需要解决三个问题:识别问题,解码问题和学习问题,对这三个问题的回答构成了隐马氏模型的理论。以前主要用于语音识别,1989年Churchill将其引入计算生物学。随着人类基因组计划(HGP)的开展,作为生命科学的核心学科——生物信息学也在不断地向前发展,目前HMM是生物信息学中应用比较广泛的方法。本文在研究了隐马氏模型的基础上探讨了其在生物信息学中的应用。首先,本文介绍了隐马尔可夫模型的研究现状及发展趋势和生物信息学的基本知识,讲述了生物信息学的发展背景及其主要研究内容。然后重点讨论了隐马尔可夫模型的基本原理与概念,总结介绍关于隐马尔可夫模型应用的三个基本问题及其相应的算法,接着研究了隐马尔可夫模型在生物序列比对和CpG岛位置中的应用。最后,由于经典隐马尔可夫模型在计算状态转移概率和输出观测值概率时只考虑当前状态而不考虑历史,有一定的局限性,于是本文改进了经典隐马尔可夫模型的状态转移和输出观测值的markov假设,建立新模型。本文在经典隐马尔可夫模型和新模型的二阶前向-后向算法的基础上,推导出了新模型下的三阶前向-后向算法,然后将前向-后向算法推广到一般情况,同时导出了新模型下的二阶viterbi算法,并介绍了二阶EM算法。
【Abstract】 The Hidden Markov Model(HMM)is a statistical method, which was introduced in 1970s. Each model consists of a Hidden Markov process and an observation process. There are three problems needed to be solved by using Hidden Markov Models, which are recognizing, decoding and training, The answers for those three problems consist of the theory of Hidden Markov Model. Previously, it was mainly used to recognize the speech, and Churchill put it to computational biology in 1989. With the development of Human Genome Project, bioinformatics, the core discipline of life science, has developed fast. Currently, HMM has a wider application in bioinformatics. This paper focus on the researchment of bioinformatic, which is based on the Hidden Markov Models.In this paper, firstly, the present research situation and developing trends of HMM are introduced, and basic knowledge of bioinformatics is described, which contains the background of biology development and main research contents. Secondly, it introduces the foundational theories and basic concepts of HMM emphatically, then summarizes out three basic problems and algorithms of HMM applications. Next, we analyse the application of the sequence alignment and CpG islands based on HMM in detail. Finally, among the caculation of the state transition probability and the output observation probability, the classical HMM always considers the current state rather than historical states, so this paper improves the markov assuption of state transition and output observation sequences, and establishes the new model.This paper gets the third-order forward-backward algorithm and ordinary circumstances, which is based on HMM and the second-order forward-backward algorithm of new model. In addition, this paper proposes second-order viterbi algorithm and introduces second-order EM algorithm.
【Key words】 Hidden Markov Model; bioinformatics; forward-backward algorithm; viterbi algorithm; EM algorithm;
- 【网络出版投稿人】 中南大学 【网络出版年期】2012年 04期
- 【分类号】Q811.4;TP301.6;O211.62
- 【被引频次】3
- 【下载频次】411