节点文献

基于主成分分析及匹配聚类分析的数据表语义压缩方法

Semantic compression for data tables based on principal component and matching clustering analysis

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 冯静金远平冯欣

【Author】 Feng Jing Jin Yuanping Feng Xin(School of Computer Science and Engineering,Southeast University,Nanjing 210096,China)

【机构】 东南大学计算机科学与工程学院东南大学计算机科学与工程学院 南京210096南京210096

【摘要】 提出一种基于主成分分析及匹配聚类分析的数据表语义压缩方法PCA-C lustering.主成分分析利用属性间相关性,提取主成分以实现纵向压缩;匹配聚类通过对匹配程度的量度决定元组的隶属,用较少的簇集代表元组代替所有元组以实现横向压缩,并充分利用较小的允许误差取得更好的压缩比.仿真实验结果表明,在数据属性间线性相关关系明显的情况下,PCA-C lustering在压缩比方面平均优于Fascicles和ItCompress 10%~15%左右;与采用CaRT模型的SPARTAN相比,由于CaRT对于线性相关明显的数值型属性效果不够理想,PCA-C lustering仍然具有较好的压缩比.

【Abstract】 A principal component analysis and matching clustering based approach to semantic compression for data tables,PCA-Clustering,is proposed.The principal component analysis extracts the principal component and implements the column-wise compression,using the correlation between attributes.The matching clustering analysis determines which group a row should belong to through matching degree measurement,replacing all rows with the cluster representative rows of which the number is much small and thus implementing the row-wise compression.The simulation experiment results show that when there is a strong linear correlation between data attributes,PCA-Clustering can achieve better compression effect than existed methods.More specifically,the compression ratio of PCA-Clustering is about 10%-15% higher than that of Fascicles and ItCompress.Compared with SPARTAN using CaRT model,PCA-Clustering also has a better compression ratio because CaRT is not very effective for numeric attributes with a strong linear correlation.

【基金】 国家自然科学基金重大研究资助项目(90412014);东南大学科学基金资助项目(XJ0409150)
  • 【文献出处】 东南大学学报(自然科学版) ,Journal of Southeast University(Natural Science Edition) , 编辑部邮箱 ,2006年06期
  • 【分类号】TP311.13
  • 【被引频次】4
  • 【下载频次】174
节点文献中: 

本文链接的文献网络图示:

本文的引文网络