节点文献

基于商空间的构造性数据挖掘方法及应用

A Structural Method of Data Mining Based on Quotient Space and Its Applications

【作者】 张燕平

【导师】 张铃;

【作者基本信息】 安徽大学 , 计算机应用, 2003, 博士

【摘要】 随着计算机模式识别技术的研究和发展,其应用范围也在不断拓展,这使得许多实际问题,如金融工程预测、基于内容的多媒体数据库检索等领域的识别分类成为可能。总结这类问题,可以发现它们表现出的共有特征:①属于海量数据的处理问题,问题的特征空间维数高,提供的样本数量大而且所涉及的类别数目多。②牵涉的因素很多,问题本身是个不完备信息系统。③采集的数据中存在着难以明确界定的噪音信息。 针对这类问题,本文展开了一系列的研究探讨,主要工作包括: 1.分析这类问题求解的一种可行方案是:用商空间法来合理地对复杂问题进行粒度描述,分解复杂问题为可求解的。提出对分解后的不同粒度的子问题,引入构造性机器学习方法首先获取不同粒度的学习规则,然后再合成相关的规则,最终得到复杂问题的综合规则。 2.针对相同粒度中如何得到学习规则问题,提出了多侧面递进MIDA的基本框架,对原有的超球面覆盖算法进行了必要的改进。指出了覆盖方法最优之处在于覆盖领域完全真实地反映了样本的分布情况,给出了在此方法中需要进一步研究的三个问题。第一个是对该算法识别的正确率与泛化能力之间矛盾的解决,利用没有被任何覆盖领域覆盖的样本(即拒识向量),引入了多侧面递进的处理方法MIDA(Muliside increasing by degrees algorithm)。第二个是如何改进覆盖方法,使得覆盖领域可以识别噪音(即异动向量),为此引入了覆盖领域涵盖的向量个数为识别中的权值。第三是如何减少所得到的覆盖领域的数目,为此引入了重复覆盖方法RCA(Repeat cover algorithm)和去除覆盖领域过小的方法,这有利于提高网络的泛化能力。一个值得研究的方案是将覆盖的思想(或基于覆盖的对数据的描述方法)应用于特征选择的主分量分析,为此,引入点对的概念,提出了点对主分量分析方法DPCAA(Double-point Principal Component Analysis Algorithm)。 3.在商空间模型中,利用商拓扑形成的多数据源,提出了一种新的概率决策型数据挖掘规则算法DDMR。作者认为:对于高维海量数据的对象,用多侧面递进方法进行划分、分解,使它由难变易;而对于多表描述的复杂的数据库或数据仓库,则可认为是给定了多侧面的一个复杂对象,因此,在商空间的模型下,两者可用相同的方法分析、处理、识别。 股市的运行是一个非常复杂的不完备的非线性过程,因此,需要用对不完备信息进行处理的非线性模型代替传统的统计模型,以便进一步提高股市预测的质量。本文将构造性机器学习算法用于建立股市分析的商空间模型,其中的主要工作有: 1.结合股市趋势预测这一实际问题,建立了股市分析的商空间模型,将多侧面递进算法MIDA、概率决策型数据挖掘规则算法DDMR应用于时间序列的预测问题。 2.对序列问题的预测,提出处理的方法应尊重数据本身规律,不人为的采取另行预处理,以便能挖掘出对象的本质规律。对采集的股市数据,直接按一定的时段、交易量组成序列,进行识别分类,实验测试结果令人满意,因此,提出的方法具有普遍意义。

【Abstract】 With the development of computer’s pattern recognition technology, its applications have been extended ceaselessly. It is able to recognize these problems, such as financial engineer forecasting and multimedia database searching based on contents. Not only these problems have a feature space of high dimensionality and a data set with large amount of samples that belong to many different classes, but also the system involves many factors and is imperfection information systems. In addition, there is noise information that is difficulty to recognize. In the dissertation, the following methods are propose to solve these problems as far as possible:1.A useful scheme to solve these problems is proposed. Complex problems are represented by different granules based quotient space. After learning rules of different granules achieved, integrate rules of the complex problem can be gained by composing relative rules.2.How to gain learning rules of the same granule, a multi-side increasing by degrees algorithm (MIDA) is proposed. The best advantage of the covering algorithm is to reflect a sample set distribution truly. Three problems that are necessary to be analyzed further are proposed. First is to solve the conflict between validity and extend ability. Second is to recognize noise information by improving covering algorithm. The last is how to decrease the number of the covering domain. In addition the thought based on a covering algorithm may be used in selecting feature and analyze principal component. MIDA improves on the old cover algorithm and reduces conflict between validity and extending ability some way.3.Based on the structural machine learning, a probability decisional data-mining algorithm (DDMR) is proposed by composing relative rules of multi data sources that are built in quotient on topological. For objects with a high dimensionality feature space and a data set with large amount of samplesthat belong to many different classes, it is useful to divide and discompose them according to MIDA. For a complex database or data warehouse, it may be thought a complex object that its multi sides are defined. So we can use the same method to analyze, deal with, and recognize both based on the quotient space.It is stated in theory that stock market is generated from a very complex nonlinear dynamical system. As a result, it is necessary to replace the traditional statistic model with nonlinear model that can deal with imperfective information in order to improve the quality of forecasting stock market. In this dissertation, the structural machine learning algorithm is used in the quotient space of analyzing stock market. The main work includes:1.Combining the real problem of forecasting stock, the quotient space of analyzing stock market is constructed in this paper. And author applies MIDA and DDMR to forecasting time sequence.2.For problems to forecast sequence, it is important and effective to use data directly and not change them artificially in order to mine true rules of the object. For collecting stock data, we recognize and classify them in according to a defining period of time or exchanging volume, the result of our experiment is satisfying. Therefore the method proposed in the dissertation is applied widely.

  • 【网络出版投稿人】 安徽大学
  • 【网络出版年期】2003年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络