节点文献

面向中观审计的规则发现算法研究

Research on Algorithms of Finding Rules Applied to Industry Audit

【作者】 陈耿

【导师】 孙志挥;

【作者基本信息】 东南大学 , 计算机应用技术, 2005, 博士

【摘要】 目前,我国政府部门为了规范市场经济秩序,防范各类经济风险,十分重视并积极开展各类审计工作,提出了对重点资金、重点领域、重点项目的审计监督。但是,从庞大的数据库系统中迅速地提取出有用的审计信息成为当前审计工作面临的严峻挑战,这就更加要求从多个层面探索更有效的审计思路、新方法和新技术。本论文研究工作就是试图将中观审计理论与数据挖掘等计算技术相结合,从行业中各个单位的数据库系统中提取出中观(行业)审计假设与关联模式,用以指导中观审计工作的深入开展,并在此基础上积极探索审计智能化、自动化的理论与技术。本文的研究工作的创新性主要体现在以下几个方面:(1)从中观审计检测的实际需要出发,提出了非对等结构的分布式数据库环境下关联规则挖掘的原型系统AuditMiner体系结构,即由局部站点和全局站点协同完成关联规则的挖掘任务。(2)提出了面向中观审计的基于二进制形式的候选频繁项目集生成和相应的计算支持数算法B-Gen,用以降低生成频繁项集的实现难度,将该算法与Apriori、FUP、FDM等算法相结合,提出了BApriori、BFUP和FDM等算法,可以显著提高关联规则的生成效率。(3)鉴于大数据集环境下的关联规则发现日益受到重视,分布式关联规则发现是解决这一问题的有效方法。针对非对等结构分布式数据库环境,本文提出了相应的算法GFDA,算法可以高效地对分布环境下的数据进行关联规则发现。(4)在中观审计检测中大量遇到规则增量更新问题。本文在DW.Cheung等人提出的FUP算法的基础上,引入候选支持度、次频繁项集和支持数上界等概念,提出了增量式关联规则发现算法IFUP;进一步针对分布式环境下的关联规则的增量更新问题,提出更新算法LUDA和GUDA以及更新算法LUDA2和GUDA2。这些更新算法能够充分利用已挖掘的结果,产生较少数量的候选频繁项目集,通信代价低,算法效率较高。(5)将Benford法则应用于发现异常交易,引入差异度概念与全局关联规则进行比对获取异常模式,可以有效提高规则的审计兴趣度。(6)研制以海关为审计主体的、面向中观审计检测的、分布式审计关联规则挖掘原型系统AuditMiner,用以验证论文中提出的各算法的适用性和有效性。

【Abstract】 Recently, our governments attach importance to audit and ask the department of audit to strengthen to supervise the important state capitals for preventing the risks of economic. It is agreat challenge, however, to rapidly mine the useful information on audit from a vary large database system. It enforces us to find more effective auditing theories, methods and technologies. We attempt to combine the theory of industry audit and the technology of data mining to mine association pattern and industry audit assumptions from the databases of enterprises in the same industry, and then to find out the audit risks behind data. So, the paper is more useful in research and application.The main contribution of the paper are listed as follows:(1) According to the demand of industry audit, the paper presents the architecture of a data mining system AuditMiner based on distributed database environment, in which the task of mining association rules is completed together by global site and local sites.(2) Proposed an binary system based method B-Gen to generate candidate frequent itemsets and corresponding supporting counts efficiently, which needs only some operations such as“and”,”or”and“xor”. Applying this idea in the existed association mining algorithm Apriori, FUP and FDM, the corresponding improved algorithm BApriori, BFUP and BFDM is proposed.. The theoretical analysis and experiment testify that they are effective and efficient..(3) Considering that more and more attention have been payed to the problem of association rule mining in large data set, distributed association mining is a effective method to solve this problem. The paper proposes an algorithm of distributed association mining algorithm GFDA based on the distributed architecture of the data.(4) Based on the FUP algorithm, the paper proposes several conceptions including backup support threshold, minor frequent candidates set and upper bound of support count, then presents an improved algorithm IFUP. Furthermore, incremental association rule mining in distributed environment are considered, algorithms LUDA, GUDA, LIDA2 and GUDA2 are proposed to solve this problem.(5) Propose an algorithm to mining abnormal transactions by Benford law. Present a concept of difference to compare association from abnormal transactions with global association rules for extracting more interesting rules from global association rules.(6) Develop a prototype system AuditMiner for mining distributed association rules from the customs’database system by industry audit. The algorithms presented in the paper are tested to be effective and efficient.

  • 【网络出版投稿人】 东南大学
  • 【网络出版年期】2007年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络