With the rapid development of accordingly the data, HTML defects increasingly obvious, the traditional web technology can't satisfy the needs of internet, the semi-structured XML solve this problem. XML has strong expansibility and readability, It can effectively describe all kinds of data, and play more and more important role in data representation and data exchange. So, KDD and the database must support to XML. The similarity between XML documents is the foundation of document clustering、KDD and informat...