节点文献

一种基于结构信息总结树的XML文档聚类方法

Clustering XML Documents Based on a Structural Summary Tree

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 梁作鹏吴文明董逸生

【Author】 LIANG Zuo-peng,WU Wen-ming,DONG Yi-sheng (Department of Computer Science & Engineering, Southeast University, Nanjing 210096, China)

【机构】 东南大学计算机科学与工程系东南大学计算机科学与工程系 江苏南京210096江苏南京210096江苏南京210096

【摘要】 提出一种有效的XML文档结构信息表达方法,用数字化的结构总结树SST对XML文档的结构信息进行编码,在此基础上给出结构距离的定义,并采用遗传算法对XML文档进行聚类.实验证明该方法分类准确率高,易于实现,且不需先验的DTD知识.

【Abstract】 An approach for calculating the structural similarity between XML documents is proposed in this paper. The structural information of an XML document is captured with a structural summary tree (SST). By encoding elements as digital numbers, a SST is transformed to a digit-labeled tree. Digital numbers at different tree levels are concatenated to form a vector after the normalization process. Consequently, each XML document is represented as an m-dimension vector. The GA-based clustering algorithm is adopted since it is able to provide good results irrespective of the starting configuration. Experimental results show the effectiveness and scalability of the approach.

  • 【文献出处】 应用科学学报 ,Journal of Applied Sciences , 编辑部邮箱 ,2005年01期
  • 【分类号】TP393
  • 【被引频次】9
  • 【下载频次】158
节点文献中: 

本文链接的文献网络图示:

本文的引文网络