节点文献

基于主成分分析及匹配聚类分析的数据表语义压缩方法

Semantic compression for data tables based on principal component and matching clustering analysis

推荐 CAJ下载
PDF下载
不支持迅雷等下载工具，请取消加速工具后下载。

【Author】 Feng Jing Jin Yuanping Feng Xin(School of Computer Science and Engineering,Southeast University,Nanjing 210096,China)

【机构】东南大学计算机科学与工程学院；东南大学计算机科学与工程学院南京210096；南京210096；

【摘要】提出一种基于主成分分析及匹配聚类分析的数据表语义压缩方法PCA-C lustering.主成分分析利用属性间相关性,提取主成分以实现纵向压缩;匹配聚类通过对匹配程度的量度决定元组的隶属,用较少的簇集代表元组代替所有元组以实现横向压缩,并充分利用较小的允许误差取得更好的压缩比.仿真实验结果表明,在数据属性间线性相关关系明显的情况下,PCA-C lustering在压缩比方面平均优于Fascicles和ItCompress 10%~15%左右;与采用CaRT模型的SPARTAN相比,由于CaRT对于线性相关明显的数值型属性效果不够理想,PCA-C lustering仍然具有较好的压缩比.更多还原

【Abstract】 A principal component analysis and matching clustering based approach to semantic compression for data tables,PCA-Clustering,is proposed.The principal component analysis extracts the principal component and implements the column-wise compression,using the correlation between attributes.The matching clustering analysis determines which group a row should belong to through matching degree measurement,replacing all rows with the cluster representative rows of which the number is much small and thus implementing the row-wise compression.The simulation experiment results show that when there is a strong linear correlation between data attributes,PCA-Clustering can achieve better compression effect than existed methods.More specifically,the compression ratio of PCA-Clustering is about 10%-15% higher than that of Fascicles and ItCompress.Compared with SPARTAN using CaRT model,PCA-Clustering also has a better compression ratio because CaRT is not very effective for numeric attributes with a strong linear correlation.更多还原

【关键词】语义压缩；主成分分析；匹配程度；
【Key words】 semantic compression； principal component analysis； matching degree；

【基金】国家自然科学基金重大研究资助项目(90412014);东南大学科学基金资助项目(XJ0409150)

【文献出处】东南大学学报(自然科学版) ,Journal of Southeast University(Natural Science Edition) , 编辑部邮箱 ,2006年06期

【分类号】TP311.13
【被引频次】4
【下载频次】174

知网节下载

节点文献中：

本文链接的文献网络图示:

本文的引文网络

节点文献