节点文献

基因表达式编程核心技术研究

【作者】 左劼

【导师】 唐常杰;

【作者基本信息】 四川大学 , 计算机应用技术, 2004, 博士

【摘要】 进化计算是当前人工智能、知识工程,数据挖掘中的研究热点。遗传算法和遗传编程,是众多进化计算模型中的两个最典型的模型。遗传算法采用线性编码、求解普通的优化问题。遗传编程则采用树形编码,试图求出解决问题的程序。F.Candida于2001年草创了新的进化计算模型基因表达式编程(Gene Expression Programming)。同时具有遗传算法的简单性、也具有遗传编程的功能。在对很多问题的求解效率上,比普通的遗传编程高2-4个数量级。F.Candida在草创的同时,留下了大量的理论空白,技术缺陷和遗憾之处。本文在前人工作的基础上对基因表达式编程的核心技术进行了研究,主要结果和贡献如下: (1)对基因表达式编程的基因编码进行了坪沦分析。给出了K-序列和表达式树之间的关系,指出它们之间的表达能力是一致的。随后在给出的定理中指出,基因表达式编程是可靠且完备的。满足t=h·(λ(F)-1)+1的GEP基因一定能够解码为一棵完整的表达式树。这为基因表达式编程的基因编码给出了理论依据。 (2)提出了更有理论背景的基于复相关系数的适应度函数。并对采用复相关系数作为评价函数的基因表达式编程进行了收敛性分析,指出,基于基因表达式编程的符号回归是依概率收敛到全局最优染色体的,针对符号回归中的常数问题,提出了MC常数方法,并进行了理论分析,结果表明,MC方式是简单但是却非常有效的,为了达到指定的精度,MC方法所付出的代价是对数级的。 (3)对基因表达式编程建立了上下文无关文法模型。指出基因表达式编程和仅含有单个非终结符的上下文无关文法在描述能力上是等价的。 (4)根据基因表达式编程的上下文无关文法模型,指出,基因表达式编程不能处理包含多个非终结符的上下文无关文法。提出了扩展的基因表达式编程方法,解决了基因表达式编程的这一重大不足。在扩展的基因表达式编程中给出四川大学博上学位论文了基因构造方法一等位I丈一表达式,多段基因。证明了扩展的基因表达式编程基因编码的有效性,同时指出,基因表达式编程就是扩展的基因表达式编程的特例。 (5)提出了新的概念谓词关联规则.和基于基因表达式编程的挖掘系统。分析挖掘系统的特性,证明了传统关联规则是谓词关联规则的特例。任何传统关联规则可以表示为·系列简单关联规则的与。提出井实现了谓词关联规则挖掘算法,井目.根据启发性知识,设计了特别的适应度函数。两组实验表明,算法是有效了的,能发现·些用传统关联规则挖掘算法不能发现的规则。基因表达式编程应用于谓词关联规则挖掘是成功的。 (6)提出了两种基于GEP的方法进行时间序列预测。滑动窗口预测法直接发现时间序列中历史数据到未来数据的函数关系,并以此进行预测。微分方程预测法则利用训练数据建立关于时间的高阶常微分方程,并在给定的初值条件下进行顶测。为了减小数据中噪声的影响,提出了微分显微插值方法,有效地过滤了数据中的噪声,并且使得一阶导数更加精确,提高了方法的可靠性。大量的实验,特别是在太阳黑子数据上的实验证明系统是有效的,性能是良好的。关键词:基因表达式编程,进化计算,遗传算法,遗传编程,上下文无关文法,谓词关联规则挖掘,时间序列预测

【Abstract】 Evolutionary Computation is a. hot point in the research area of Artificial Intelligence, Knowledge Engineering and Data Mining. Genetic Algorithm and Genetic Programming are the most important computation models in Evolutionary computation. Genetic Algorithm uses linear bit string as chromosome code, and solves ordinary problems. Genetic Programming uses tree structure as chromosome code, and want find out the programs for the problems. In 2001, F. Candida proposes a new Evolutionary Computation method named Gene Expression Programming (GEP). GEP is as simple as Genetic Algorithm, and as functional as Genetic Programming. But for most problems, GEP is much fast than Genetic Programming in 2 - 4 magnitudes. While creating new concepts and algorithms, F. Candida left some theoretic issues as open problems. F. Candida also uses many properties without proof. Some conjectures is still list in the first book about GEP written by F. Candida. This paper try to answer these problems based on our research on GEP. The main result and contribution are as follows:(1) Gives rigorous analysis on GEP encoding. This work reveals the relation between K-expression and the expression tree. It points out that they have the same expression power. The dissertation proofs an important theorem about soundness and completeness of GEP gene encoding. It points out that the gene satisfied t = h (X(F) - 1) + 1 can be decoded into a valid function expression tree systematically. This is the basic theory of GEP.(2) Proposes a new fitness function via Complex Correlation Coefficient with strong statistics theoretic background. This work analyzes convergence of GEP using Complex Correlation Coefficient fitness function. It is proved that Symbol Regression based on GEP will convergence by probability to global optimization status. The dissertation also proposes a new constant mining method named Meta constants(MC). The theoretical analysis and experiment results show thatMC method is simple but powerful enough that it can get the predefined precision with only logarithmic cost.(3) Proposes the context-free grammar model for GEP. Proves that the presentation power of GEP and the context-free grammar with one none-terminal are equivalent.(4) According’ to the context-free grammar model of GEP, the dissertation point out that Candida’s GEP techniques cannot process context-free grammar with multiple none-terminals. The paper proposes the Extended Gene Expression Programming(EGEP) to solve this weakness. It also proposes new concepts about allelic K-expression and multi-segment gene. Proves that the validity of the Extended GEP gene encoding. Proves that GEP is a special case of EGEP.(5) Proposes a new concept named Predicate Association Rule and implements a mining system based on GEP. Proves that the traditional association rule is a special case of Predicate Association Rule. Gives the algorithms about mining Predicate Association Rules. Designs a new fitness function according to the heuristic knowledge. Two groups of experiments show that the application of GEP is successful.(6) Proposes two kinds of time serial prediction methods based on GEP: (a) By Slide Window Method discovers the predication function between history data and future data, (b) By Ordinary Differential Equation Method mines an ordinary differential equation from time serial data, and makes prediction based an initial condition. In order to reduce the affection of the noise, the dissertation proposes Differential by Microscope Interpolation method. This method improves the precision of first order derivative, and improves the reliability of the system. Extensive experiments on real data sets for sun spot prediction show that our new method is effective and powerful.

  • 【网络出版投稿人】 四川大学
  • 【网络出版年期】2005年 02期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络