节点文献
基于支持向量机的过程工业数据挖掘技术研究
A Study on Process Industrial Data Mining Based on Support Vector Machines
【作者】 张英;
【作者基本信息】 浙江大学 , 控制理论与控制工程, 2005, 博士
【摘要】 论文以聚酯工业产业链中的两个关键过程对二甲苯(Para-Xylene,简称PX)吸附分离、氧化过程为背景,以SVM方法为工具,从数据挖掘的角度分别对预言型数据挖掘和探索型数据挖掘在PX工业中的应用进行了研究。最后在提出算法的基础上实现了一个数据挖掘的软件平台ESP-PISDMS。论文主要的研究工作可以概括为如下几个方面, (1) 提出了一种改进的SVM分类算法。从测试样本是否满足KKT条件出发,分析新增样本和原有样本交互学习中支持向量集构成的变化,将尽可能多的可能包含支持向量的样本选入当前的工作训练集中,提高算法精度。通过合理地划分样本集的大小,和传统的算法相比该算法在处理大规模样本时具有较高的精度和训练速度,并且非常适用于在线增量学习。 (2) 提出了两种基于SVM的增量建模方法SVMIL和ISVM,随着时间推移,每次在模型中增加一批(一个)样本进行增量学习的同时,采用启发式策略去掉工作集中一批(一个)样本,这样可以在软测量建模中不断增加能够代表新工况信息样本的同时控制工作样本集的规模。将提出的软测量建模方法用于PX吸附分离过程PX纯度的预测中,并和其他方法作了比较。 (3) 提出了两种用于模糊SVM的模糊隶属度函数-基于κNN的隶属度函数和基于支持向量数据域描述(SVDD)的隶属度函数。前者在特征空间中根据样本与其最临近样本点的距离来确定其隶属度,后者首先得到训练集中样本的数据域描述模型,然后根据每个样本偏离数据域的程度赋予不同的隶属度。将提出的模糊隶属度函数模型及其建模方法用于工业PX氧化过程中4-CBA浓度预测的问题中,并和其他方法作了比较,提出的模型可以有效减少回归误差,提高SVM抗噪声的能力。 (4) 提出了一种基于SVM的超矩形规则提取算法HRE。在HRE算法中,数据样本先被映射到一个高维的特征空间中,用于得到样本的最优分类超平面以及支持向量,然后在一些启发式条件的限制下,在得到的支持向量和聚类中心的基础上构建超矩形规则。在HRE中控制规则的支持度以及数量非常容易,得到的规则具有更高的质量。
【Abstract】 In this dissertation, several issues and the corresponding solutions about data mining technology based on support vector machines (SVM) are discussed. Based on SVM, some algorithms of data mining are proposed. Then the proposed algorithms are applied to a practical industry process of PX. The main contributions are described as follows,(1) A new incremental SVM learning algorithm (FS-SVM) is proposed. The training samples and incremental samples will influence each other when incremental samples are added into the current working set. In FS-SVM, support vectors are selected as much as possible into current working set to increase the predicted accuracy. The simulated result on UCI Adult data sets indicates that the proposed algorithm can efficiently increase the accuracy and speed.(2) In order to overcome model failure problem, a soft sensor modeling method based on incremental SVM (ISVM) is presented. In ISVM, an incremental sample which represents new operational condition is introduced to model, at the same time, an old sample is discarded from the model to control the size of working set. The proposed method is applied to predict the purity of PX in a PX fractionation by adsorption process. Simulation results indicate that the proposed soft sensor model actually increases the adaptive abilities to various operation conditions and solves the model failure problem caused by change of operation conditions or load.(3) In order to overcome the overfitting problem caused by the fixed penalty factor, fuzzy support vector regression (FSVR) and fuzzy least squares support vector machines(FLS-SVM) are proposed to deal with the problem. Strategies based on k nearest neighbor (&NN) and support vector data description (SVDD) are adopted to set the fuzzy membership values of data points. The proposed FSVR and FLS-SVM algorithms based on kNN and SVDD are applied to predict the concentration of 4-carboxy-benzaldehyde (4-CBA) in a practical purified terephthalic acid (PTA) oxidation process. Simulation results indicate that the proposed method actually reduces the effect of outliers and yields higher accuracy.(4) SVM is applied to many research fields because of its good generalization ability and solid theoretical foundation. However, as the model generated by SVM islike a black box, it is difficult for user to interpret and understand how the model makes its decision. A hyperrectangle rules extraction (HRE) algorithm is proposed to extract rules from trained SVM. Support vector clustering (SVC) algorithm is used to find the prototypes of each class, then hyperrectangles are constructed according to the prototypes and the support vectors under some heuristic conditions. When the hyperrectangles are projected onto coordinate axes, the if-then rules are obtained. Experimental results indicate that HRE algorithm can extract rules efficiently from trained SVM and the number and support of obtained rules can be easily controlled according to a user-defined minimal support threshold.(5) A novel data mining method is introduced to solve the multi-objective optimization problems of process industry. A hyperrectangle association rule mining (HARM) algorithm based on support vector machines is proposed. Hyperrectangles rules are constructed on the base of prototypes and support vectors under some heuristic limitations. The proposed algorithm is applied to a simulated moving bed (SMB) paraxylene adsorption process. The relationships between the key process variables and some objective variables such as purity, recovery rate of PX are obtained. Using existing domain knowledge about PX adsorption process, most of the obtained association rules can be explained.(6) In order to simplify the process of data mining, a data mining "5P" model of process industry is presented and a data mining system software ESP-PIDMS is written. Using the ESP-PIDMS, some data mining models are built to solve real industrial problems.
【Key words】 process industry; data mining; support vector machines; para-xylene; pure terephthalic acid;