节点文献

复杂业务领域数据清洗规则冲突检测方法

A Method for Conflict Detection of Data Cleaning Rules in Complex Business Areas

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 何俊张德海张云飞杨雪

【Author】 HE Jun;ZHANG Dehai;ZHANG Yunfei;YANG Xue;College of Information Engineering, Kunming University;College of Software, Yunnan University;

【机构】 昆明学院信息工程学院云南大学软件学院

【摘要】 针对复杂业务领域中采用规则库进行数据清洗的方法存在逻辑冲突频发和出错率高的问题,提出一种基于分级规则库的数据清洗方法(Hierarchical Rule Data Cleaning Method, HRDCM).设计分级规则库数据清洗框架,自顶向下构建规则库的逻辑关系,采用逐级向下约束的方式建立规则冲突检测机制,并给出相应的算法.以扶贫领域为例构建三级规则库,采用某贫困县的扶贫数据开展实验,结果表明HRDCM方法的规则逻辑冲突减少后使清洗效率提升,清洗结果出错率降低,验证了方法的科学性和合理性.

【Abstract】 In order to solve the problem of frequent logical conflicts and high error rate in data cleaning using rule libraries in complex business areas, a data cleaning method based on hierarchical rule libraries Hierarchical Rule Data Clean Method(HRDCM) is proposed. The data cleaning framework of hierarchical rule base is designed. The logical relationship of rule base is constructed from top to bottom, and the rule conflict detection mechanism is established in a step-by-step approach, and the corresponding algorithm is given. Taking the area of poverty alleviation as an example, a three-level rule bank was set up, and the poverty reduction data from a certain poverty county was used to carry out experiments. The results showed that the rule logic conflict of HRDCM reduced and the cleaning efficiency increased, and the error rate of cleaning results decreased significantly.

【基金】 国家自然科学基金项目(61263043);云南省地方本科高校基础研究联合专项基金项目(2017FH001-05)
  • 【文献出处】 昆明理工大学学报(自然科学版) ,Journal of Kunming University of Science and Technology(Natural Science) , 编辑部邮箱 ,2020年02期
  • 【分类号】TP311.13
  • 【被引频次】2
  • 【下载频次】169
节点文献中: 

本文链接的文献网络图示:

本文的引文网络