节点文献
单维关联规则挖掘算法研究
【作者】 李川;
【导师】 范明;
【作者基本信息】 郑州大学 , 计算机软件与理论, 2003, 硕士
【摘要】 关联规则揭示项集间有趣的相联关系,可广泛应用于购物篮分析、相关分析、分类、网络个性化服务等领域,是数据挖掘的重要研究课题。自1993年R.Agrawal,R.Srikant首次提出该问题以来,已出现了许多关联规则挖掘算法。这些算法大多基于Apriori算法,在挖掘频繁模式时需要产生大量候选项集,多次扫描数据库和维护一棵很大的Hash树,时空复杂度过高。 本文提出基于被约束子树的频繁模式挖掘算法Fpmine和基于线索频繁模式树的关联规则产生算法SPF,有效解决了数据库中单维关联规则挖掘问题。实验表明,在相同数据集上,与FP-Growth算法相比,算法Fpmine的挖掘速度提高了一倍以上,而所需的存储空间减少了一半;随着数据库规模的增大,算法Fpmine具有很好的可伸缩性;对于稠密数据集,本文算法也具有良好的性能;SPF算法具有极好的时空效率;Fpmine-SPF算法挖掘关联规则的速度远快于较长期以来广泛使用的Apriori算法,并有相当好的可伸缩性。 关联规则挖掘往往生成过多的规则,使用户很难进行取舍。为此,近年的研究提出一种有效的替代手段—挖掘频繁闭项集规则。频繁闭项集规则蕴含了所有关联规则,数目却大为减少。挖掘频繁闭项集规则大幅提高了挖掘的效率和规则的有效性,解除了用户的负担。 本文实现了频繁闭项集、频繁闭项集规则的挖掘算法FCIS和CI_RULES。实验表明,在相同数据集上,与已发表的最新成果—Han的CLOSET算法相比,本文算法FCIS速度提高两个数量级以上,并有极好的可伸缩性;CI_RULES算法挖掘频繁闭项集规则的运行速度较Fpmine-SPF算法挖掘关联规则的速度快两个数量级以上,并有非凡的可伸缩性。
【Abstract】 Association rules mining, as the most important subject in data mining, reveals the corelations between itemsets and therefore can be widely applied to many fields such as market basket analysis, corelation analysis, classification, web-customised service, etc. Since 1993 when R. Agrawal, R. Srikant firstly proposed the concept of association rules, a lot of algorithms have come up in mining of association rules. While most of these algorithms are based on Apriori line, will generate a huge number of candidate itemsets, need multiple scans over database, and maintain a big hash tree, so the time and space complexity is too high.This paper proposed the constrained-tree based algorithm Fpmine for fast mining of frequent patterns, and the association rules mining algorithm SPF based on threaded frequent pattern tree. The two algorithms dwell on the single-dimension association rules mining in database. Experiments show that Fpmine runs two times as fast as the most recently proposed algorithm Fp-growth and saves half the memory; moreover, our algorithm has a quite good time and space scalability with the number of transactions, and has an excellent performance in dense database mining as well; SPF has excellent time and space efficiency; Fpmine-SPF algorithm has a far taster speed in association rules mining than the widely used Apriori algorithm and has wonderful scalability.Association rules mining always generates too many rules, which makes ii difficult to pick the valuable rules where from. In order to tackle this problem, recent studies raised an effective alternative, mining frequent closed itemsets rules. Frequent closed itemsets rules imply all the association rules but greatly reduce the rules number. Mining of frequent closed itemsets greatly improve the rules mining effiency and effectivity of the resulting rules, hence release the users of the burden.We implement the frequent closeu itemsets mining algorithm FCIS and the frequent closed itemsets rules mining algorithm CI_RULES. Experiments diplay that compared with the latest published algorithm-the CLOSET algorithm proposed by Jiawei Han, our algorithm is more than two malgtitudes faster and has fabulous scalability; CI_RULES algorithm overtakes Fpmine-SPF algorithm by more than two malgtitudes and has extraordinary scalablity.
- 【网络出版投稿人】 郑州大学 【网络出版年期】2004年 01期
- 【分类号】TP311.13
- 【被引频次】3
- 【下载频次】158