节点文献
基于FP-树的关联规则挖掘算法的设计与实现
【作者】 刘乃丽;
【导师】 李玉忱;
【作者基本信息】 山东大学 , 计算机软件与理论, 2005, 硕士
【摘要】 数据挖掘技术是解决数据丰富而知识贫乏的有效途径,当属信息科学领域的前沿研究课题之一,有关的研究和应用极大提高了决策支持的能力,它已被公认为是数据库研究中一个极富应用前景的领域。本文描述了数据挖掘的概念、功能以及发现模式的分类。在众多的数据挖掘算法中,挖掘关联规则是数据挖掘领域中的重要研究内容,其中挖掘频繁项目集是挖掘关联规则中的关键问题之一,又最大频繁项目集已经隐含了所有的频繁项目集,所以可以将发现频繁项目集的问题转化为发现最大频繁项目集的问题,因而发现最大频繁项目集对数据挖掘具有重大意义。 以前的许多挖掘最大频繁项目集算法是先生成候选集,再进行检验,然而候选项目集产生的代价是很高的,尤其是存在大量长模式的时候。本文主要在以下几个方面对基于FP-树的关联规则挖掘问题进行研究:第一是研究了FP-树的定义和构造过程以及多种改进算法,并分析了基于FP-树进行挖掘的可行性和完整性,然后提出了基于FP-树的快速挖掘最大频繁项目集的算法Max-FI(Maximal Frequent Itemset),该算法不需要生成最大频繁候选项目集。改进的FP-树是单向的,每个节点只保留指向父节点的指针,这大约节省了三分之一的树空间。试验结果表明该算法比同样基于FP-树的DMFIA算法挖掘最大频繁项目集的效率更高。 第二是研究了挖掘有效且无冗余关联规则的问题。传统算法在生成关联规则时,或者生成规则的效率很低,或者生成的关联规则之间存在着大量的冗余,或者挖掘出的规则的支持度和可信度都很高,但却是无趣的、甚至是虚假的规则,且不能产生带有否定项的规则。本文提出了一种新的算法MVNR(Mining Valid and non-Redundant Association Rules Algorithm),在该算法中,首先对频繁项集集合进行检查,删除了那些只能生成冗余关联规则的频繁项集,然后对分析过的频繁项集集合中的每一个频繁项集生成他们的极小子集集合,
【Abstract】 Data mining technology is an effective approach to resolve the problem of abundant data and scanty information.It currently is the research frontier within the information science field.The related researches and applications have greatly improved the ability for decision supporting.It has been deemed to a field that has broad prospect of application in database research.This paper describes the conception,function and patterns of data mining.In many data mining algorithms,mining association rule is an important matter in data mining, in which mining frequent itemsets is a key problem in mining association rule.because maximum frequent itemsets embrace all frequent itemsets,the problem of mining frequent itemset is converted to the problem of mining maximum frequent itemsets.Mining maximal frequent itemsets is very important in data mining.Many of the previous algorithms mine maximal frequent itemsets by producing candidate itemsets firstly,then pruning.But the cost of producing candidate itemsets is very high ,especially when there exist long patterns.This paper studies mostly the problem of mining maiximal frequent pattern based on FP-tree.Firstly, we study the definition and construction of FP-tree and improved algorithms and analyze the feasibility and completeness of FP-tree.Then,we propose the algorithm for mining maximal frequent pattern Max-FI,which need not produce maximal candidate itemsets. The improved FP-Tree is a one-way tree and there is no pointers to point its children in each node,so at least one third of memory is saved.At last,our experimental result shows that the algorithm Max-FI is more effectively than the algorithm DMFIA based on FP-tree for mining maximal frequent patterns.Secondly,we study the problem of mining valid and non-Redundant association rules. The traditional algorithm mining association rules,or slowly produces association rules,or produces too many redundant rules,or it is probable to find an association rule,which posses high support and confidence,but is uninteresting,and even is false.Furthermore,a rule with negative-item can’t be produced.This paper propose a new algorithm MVNR(Mining Valid and non-Redundant Association Rules Algorithm),in this algorithm,firstly,we examine frequent patterns,delete thefrequent patterns which only produce the redundant association rules.Then,we produce the minimal subset of each frequent itemset in the examined frequent itemset,delete the minimal subset of existing in the minimal subset which is super subset of this minimal subset by using the character of minimal subset.At last,we produce association rules according to the conditions which user define.At last,we study the problem of the application of mining maximal frequent patterns algorithm(Max-FI) and mining valid and non-redundant association rules algorithm(MVNR) in the decision supporting system of Gong An.
【Key words】 Association rule; Maximum Frequent Itemset; Frequent Pattern Tree; Correlation; Redundancy;
- 【网络出版投稿人】 山东大学 【网络出版年期】2005年 08期
- 【分类号】TP311.13
- 【被引频次】11
- 【下载频次】510