节点文献

基于泛化中心聚类的不完备数据集填补方法

Missing Data Imputation Approach Based on Generalized Centroids Clustering Algorithm

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 王妍王凤桐王俊陆宋宝燕石展

【Author】 WANG Yan;WANG Feng-tong;WANG Jun-lu;SONG Bao-yan;SHI Zhan;School of Information,Liaoning University;School of Computer Science and Engineering,Northeastern University;

【机构】 辽宁大学信息学院东北大学计算机科学与工程学院

【摘要】 随着信息技术、云计算、互联网以及社交网络等技术的不断发展,数据规模呈爆炸态势增长.在海量数据带来丰富信息的同时,如何对海量信息进行高效的预处理成为研究的热点.其中,对于缺失数据的处理就是数据预处理技术中一项重要的挑战.传统的缺失数据的填补方法大部分都只考虑不完备集中数据完全缺失情况下的填补,然而,在海量数据集中,由于人为或者机械等原因会对数据造成一定程度的损坏,有些数据会完全缺失,而有些数据只是部分缺失,传统的填补方法未对不同程度上损坏的数据进行划分,全部按照完全缺失数据进行填补分析,忽略了部分缺失数据对数据填补结果的影响.因此,提出一种基于泛化中心聚类的填补方法(GCF),采用泛化中心聚类思想对数据进行分簇,并对随机损坏数据与聚类结果一起进行缺失数据的填补,以提高填补后数据集的正确率.实验表明,针对不同缺失度的数据集样本,提出的GCF策略在填补正确率方面都具有良好的表现.

【Abstract】 With the development of information technology,cloud technology,internet and social network,The scale of the data has grown explosively.Althouth mass data can provide wealthy information,and at the same time,howto preprocess the information efficiently has become a research focus.Among them,preprocessing the missing data is an important challenge in the pretreatment,Mosttraditional filling method for missing data only consider filling incomplete centralized data in the completely missing cases.However,due to artificial or mechanical and other reasons in mass data,this will cause a certain degree of damage to the data.Some data will be completely missing,and some missing is only partially,the traditional filling method didn’t divide the data in different degrees of damage.They all analysis completely missing,but ignore the influence of partially missing data.In this paper,a kind of method based on generalized center-clustering fill (GCF) has been proposed,thispaperadoptsthe idea of generalization center clustering to cluster the data,and fill the missing databetween the random damage data and clustering results in order to improve the accuracy of the dataset filled.Experimental results showthat the proposed GCF strategy in the accuracy of filling missing datasets that has different degree have good performance.

【基金】 国家自然科学基金项目(61472169,61472072)资助;国家科技支撑计划项目(2012BAF13B08)资助;国家“九七三”重点基础研究发展计划前期研究专项项目(2014CB360509)资助;辽宁省科学事业公益研究基金项目(2015003003)资助
  • 【文献出处】 小型微型计算机系统 ,Journal of Chinese Computer Systems , 编辑部邮箱 ,2017年09期
  • 【分类号】TP311.13
  • 【被引频次】24
  • 【下载频次】223
节点文献中: 

本文链接的文献网络图示:

本文的引文网络