节点文献
基于Apriori算法的混合型数据频繁项集挖掘算法
Hybrid Data Frequent Itemset Mining Algorithm Based on Apriori Algorithm
【摘要】 由于混合型数据同时涉及离散型和连续型属性,导致计算复杂度较高,为此提出面向混合型数据的频繁项集挖掘算法,以提高计算效率。利用Apriori算法分析事务数据库内各项集之间相互关联关系,通过最小支持度计算结果制定关联度规则,生成无向图。建立邻接矩阵,并分析事务数据库内项集在邻接矩阵中的所处位置。将无向图内事务数据全部存储至邻接矩阵中,快速生成频繁1-项集、频繁2-项集;结合项集之间的连接操作,实现频繁项集的挖掘。引入滤波算法对不同存储链路中频繁项集的滤波处理,提高数据挖精准度。实验结果表明,所提方法在频繁项集的挖掘过程中,内存占用较小,频繁项集挖掘效率较高,对数据挖掘技术的发展具有重大意义。
【Abstract】 Due to the high computational complexity of mixed data involving both discrete and continuous attributes, a frequent itemset mining algorithm for mixed data is proposed to improve computational efficiency. Firstly, Apriori algorithm was adopted to analyze the interrelationships between various sets in a transaction database, and then the association rule was formulated according to the minimum support calculation. Meanwhile, an undirected graph was generated. Moreover, an adjacency matrix was constructed, and the position of the item set of transaction database in the adjacency matrix was analyzed. After that, all the transaction data in the undirected graph were stored in the adjacency matrix, thus generating frequent itemset 1 and frequent itemset 2. According to the connection operation between item sets, we achieved the frequent itemset mining. Finally, we used the filtering algorithm to filter frequent item-sets in different storage links, thus improving the data mining accuracy. Experimental results prove that the proposed method has smaller memory footprint and higher efficiency in mining frequent item-sets, which is of great significance for the development of data mining technology.
【Key words】 Apriori algorithm; Association rule; Undirected graph; Adjacency matrix; Filtering algorithm;
- 【文献出处】 计算机仿真 ,Computer Simulation , 编辑部邮箱 ,2023年12期
- 【分类号】TP311.13
- 【下载频次】12