节点文献
复杂业务领域数据清洗规则冲突检测方法
A Method for Conflict Detection of Data Cleaning Rules in Complex Business Areas
【摘要】 针对复杂业务领域中采用规则库进行数据清洗的方法存在逻辑冲突频发和出错率高的问题,提出一种基于分级规则库的数据清洗方法(Hierarchical Rule Data Cleaning Method, HRDCM).设计分级规则库数据清洗框架,自顶向下构建规则库的逻辑关系,采用逐级向下约束的方式建立规则冲突检测机制,并给出相应的算法.以扶贫领域为例构建三级规则库,采用某贫困县的扶贫数据开展实验,结果表明HRDCM方法的规则逻辑冲突减少后使清洗效率提升,清洗结果出错率降低,验证了方法的科学性和合理性.
【Abstract】 In order to solve the problem of frequent logical conflicts and high error rate in data cleaning using rule libraries in complex business areas, a data cleaning method based on hierarchical rule libraries Hierarchical Rule Data Clean Method(HRDCM) is proposed. The data cleaning framework of hierarchical rule base is designed. The logical relationship of rule base is constructed from top to bottom, and the rule conflict detection mechanism is established in a step-by-step approach, and the corresponding algorithm is given. Taking the area of poverty alleviation as an example, a three-level rule bank was set up, and the poverty reduction data from a certain poverty county was used to carry out experiments. The results showed that the rule logic conflict of HRDCM reduced and the cleaning efficiency increased, and the error rate of cleaning results decreased significantly.
【Key words】 data cleaning; hierarchical rule base; conflict detection; poverty alleviation;
- 【文献出处】 昆明理工大学学报(自然科学版) ,Journal of Kunming University of Science and Technology(Natural Science) , 编辑部邮箱 ,2020年02期
- 【分类号】TP311.13
- 【被引频次】2
- 【下载频次】169