节点文献
从数据集中挖掘频繁函数集的研究和应用
Research and Applications of Mining Frequent Function Set from Datasets
【作者】 贾晓斌;
【导师】 唐常杰;
【作者基本信息】 四川大学 , 计算机应用, 2005, 硕士
【摘要】 数据挖掘是当前数据库研究开发和应用的热点。函数挖掘是从科学数据中发现有效的函数关系,它是数据挖掘技术的重要研究方向。传统的函数挖掘本身具有很多局限性,在实际问题中难于应用:(1)挖掘目标是单个函数,但单个函数对现实世界中规律的描述能力很弱;(2)它难以被应用在复杂的数据集上。为了弥补这些缺陷,本文做了如下探索: (1) 对函数挖掘的概念进行了扩展,提出了新的、描述能力更强的函数挖掘对象——频繁函数集(Frequent Function Set , FFS),这一新概念旨在描述在指定数据集上具有一定支持度的函数关系簇。(2) 分析了频繁函数集的性质。(3) 提出了可配置的FFS 挖掘算法——Configurable Frequent Function Set Discovering Algorithm (CFFSDA), 它灵活,可以配置使用多种搜索算法。(4) 分析CFFSDA 的不足,进一步提出了可以满足用户不同兴趣需求的基于约束的频繁函数集(Constrained FFS)和相应的挖掘框架。(5) 基因表达式编程(Gene Expression Programming ,GEP)是函数关系挖掘的新方法,本文利用GEP 配置实化了CFFSDA,并且在GEP 研究中首次采用了精度阈值队列策略(Precision Threshold Queue,PTQ),该策略有效地提高了算法的成功概率。(6) 探索了FFS 在数据库查询优化和分类中的应用。举例说明了利用FFS进行查询优化,在其WHERE 子句有等值条件和某些比较条件的SQL选择语句中,比传统查询优化策略有更好的效率。(7) 通过实验证实了FFS 的强大描述能力和FFS 在分类中的应用。同时也证实了PTQ 的有效性,它使算法在挖掘高精度复杂函数时的成功概率提高了55 倍。
【Abstract】 Data Mining is the hot topic of current database research and application. Function Mining is an important research direction of Data Mining to discover functions hidden in scientific databases. However, the traditional Function Mining is limited by the facts that (1) its purpose is to discover single function that lacks power to describe laws in the real world; and (2) it is difficult to be applied to complex databases. To break these limitations, the contributions of this article include: (1) Extending the concept of Function Mining and proposing a new mining object called Frequent Function Set (FFS) with powerful describing ability. FFS is referred to as function cluster on a specific dataset with support not less than the minimum support threshold. (2) Analyzing the property of FFS. (3) Presenting a new algorithm called Configurable Frequent Function Set Discovering Algorithm (CFFSDA) to mine FFS, which can flexibly be implemented by various searching algorithms. (4) By exploring the defects of CFFSDA, introducing Constrained FFS and presenting its mining framework. Constrained FFS can meet the need of various interests defined by users. (5) Applying Gene Expression Programming (GEP) to CFFSDA. Proposing a brand new strategy named Precision Threshold Queue (PTQ) in GEP to improve the probability of success. (6) Discussing the applications of FFS to database query-optimization and classification. Demonstrating that the query-optimization strategy using FFS is more efficient than traditional one when meeting the selection operation with WHERE Clause involving equality condition or some special comparison condition. (7) By extensive experiments, demonstrating the potentiality of FFS and its application to classification, also illustrating the action of PTQ which improves the success-probability by 55 times for mining complex functions with high precision. This article is organized as following: Chapter 0 talks about the importance and the significance of the study of Function Mining. Chapter 1 introduces Data Mining and the concept of Function Mining. Chapter 2 analyzes the limitations of traditional Function Mining, and then proposes Frequent Function Set and an algorithm called CFFSDA. By extending FFS, presents Constrained FFS and its mining framework. Chapter 3 briefly introduces GEP and Precision Threshold Queue, configures and implements CFFSDA. Chapter 4 illustrates the application of FFS to SQL query-optimization and classification. Chapter 5 gives experiments results. Chapter 6 concludes the paper with directions for future work.
【Key words】 Data Mining; Frequent Function Set (FFS); Constrained FFS; GEP;
- 【网络出版投稿人】 四川大学 【网络出版年期】2005年 08期
- 【分类号】TP311.13
- 【下载频次】136