节点文献
一种提高相似重复记录检测精度的方法
A METHOD OF IMPROVING APPROXIMATELY DUPLCATED RECORDS DETECTION PRECISION
【摘要】 如何消除数据源中的相似重复记录是数据清理研究中的一个重要问题。为了提高相似重复记录的检测精度,在相似重复记录检测算法的基础上,采用等级法为记录各字段指定合适的权重,从而提高了相似重复记录的检测精度。最后,以一个实例验证了该方法的效果。
【Abstract】 How to clean approximately duplicated records in data source is an important problem in data cleaning.To improve the detecting precision,based on method of approximately duplicated records cleaning,each field of record is appointed a proper weight through using rank-based weights method in the process of approximately duplicated records detecting.Finally,the validity of this method is proved by an example.
【关键词】 数据挖掘;
数据清理;
相似重复记录;
等级法;
【Key words】 Data mining Data cleaning Approximately duplicated records Rank-based weights method;
【Key words】 Data mining Data cleaning Approximately duplicated records Rank-based weights method;
【基金】 江苏省高校自然科学研究计划资助项目(05KJB520054);国家863计划资助项目(2003AA1Z2330);国家自然科学基金项目(70371015)。
- 【文献出处】 计算机应用与软件 ,Computer Applications and Software , 编辑部邮箱 ,2006年10期
- 【分类号】TP274.4
- 【被引频次】21
- 【下载频次】215