节点文献
大肠杆菌启动子序列特征分析与识别方法的研究
Feature Analysis and Recognition Method Research of Escherichia Coli Promoter
【作者】 冉令华;
【导师】 阮晓钢;
【作者基本信息】 北京工业大学 , 控制理论与控制工程, 2004, 硕士
【摘要】 摘 要 本课题源于国家自然科学基金项目“复杂系统意义下的生物信息学中若干问题的研究(No.60234020)”,主要以大肠杆菌启动子为研究对象,运用智能信息处理方法,对大肠杆菌启动子序列进行分析和研究,重点研究启动子序列的识别问题。论文取得的主要研究成果如下: 1. 本文应用变输入长度和滑动空位方法建立了基于神经网络的大肠杆菌启动子识别模型,根据大肠杆菌基因分子生物学有关理论与统计事实,对启动子序列组件进行研究和分析。研究发现,除两个显著保守序列特征组件外,其它几个非显著序列组件包含的特征信息对大肠杆菌启动子的识别也有一定的影响。 2. 本文提出基于数据优化的大肠杆菌启动子识别方法,并基于该方法和 BP 神经网络建立了大肠杆菌启动子识别模型 (Data Optimization&Neural Network Model, DONN)。DONN 模型选取在-10 区比对过的大肠杆菌启动子序列和相应长度的编码区序列为正负样本,在神经网络分类器进行训练之前,通过权值矩阵模型(WMM)优化训练集样本,将处理过的数据集作为神经网络的训练样本。研究结果表明,采用数据优化法建立的神经网络分类器具有较高的敏感度和综合辨识精度。 3. 本文将支持向量机(Support Vector Machine, SVM)方法用于大肠杆菌启动子的识别中,从数据库中选取一定长度的正样本序列和负样本序列,按 3:1 的比例分成训练集和测试集,建立了基于支持向量机的分类器。实验研究结果表明,基于支持向量机的识别方法优于传统的神经网络识别模型,表明其在生物信息学中有良好的应用前景。 大肠杆菌启动子的识别问题是生物信息学研究的重要问题之一。本文的研究对探索启动子的识别具有参考价值。
【Abstract】 Abstract This project is derived from Country Nature Science Foundation(CNSF)“Researching of some problems in bioinformatics in the sense of complex system (No.60234020) ”.In this paper, the Escherichia Coli Promoter sequences are analyzed byusing intelligent information processing method, and the emphases is recognition ofE.coli promoter. The main contents of this paper as follows: 1. In this paper, two types of neural network architecture are used. One islength-changed type, the other is scanned with a hole in the input window. The E.colipromoter elements are studied and analyzed on the basis of biological theory andstatistical feature of E.coli genome. The experiment results show that thenon-canonical elements can affect the recognition except the two canonical elements. 2. A recognition model is established on the basis of data optimization and BPneural network. The positive samples used in this paper are aligned in their –10 region,and the negative samples are selected from E.coli coding region. The WMM model isused to optimize training samples. Experiments results show that model based on dataoptimization has high sensibility and good accuracy. 3. In this paper, Support Vector Machine is applied to predict E.coli promoters.Sequences with definite length are selected from database and they are divided into3:1 as training samples and testing samples. A SVM-based classifier is constructed.Experiments exhibit that comparing with neural network based approaches, the SVM- based approach has better prediction performance for the testing sets. These resultsshow that SVM has good application future in bioinformatics. E.coli promoter recognition is one of the most important subjects inbioinformatics. The research results in this paper can provide reference for promoterrecognition research.
【Key words】 Escherichia Coli Promoter; BP Neural Network; Data optimization; Support Vector Machine; Recognition;
- 【网络出版投稿人】 北京工业大学 【网络出版年期】2004年 04期
- 【分类号】Q933
- 【被引频次】4
- 【下载频次】444