节点文献
一种基于DOM的XML解析器的设计与实现
Design and Realization of the XML Parser Based on DOM
【作者】 赵辉;
【导师】 陶世群;
【作者基本信息】 山西大学 , 计算机软件与理论, 2005, 硕士
【摘要】 在因特网和电子商务迅猛发展的今天,可扩展标记语言(XML)作为Internet上信息交换和计算的新型载体和标准,得到了广泛的应用。在数据交换领域中它起到了不可替代的作用。XML解析器是基于XML的信息处理的基础,要充分地使用XML文档中的信息,就必须对XML进行高效的解析。当前主要的解析方法主要有两种:DOM解析和SAX解析。DOM是一个基于树型的解析技术,它在内存中构建了一棵完整的解析树。它可以实现对整个XML文档全面动态的访问。SAX是一个用于处理XML的事件驱动的“推”模型,是一个轻量级的接口。它不是W3C标准,但它是一个得到了广泛认可的API,是大多数SAX解析器在实现的时候都遵循标准。SAX解析不像DOM解析那样建立一个整个文档的树型表示,而是在读取文档时激活一系列的事件。这些事件被推给事件处理器,而事件处理器则提供对文档内容的访问。缺点是不能够随机地访问XML文档,不支持在原地修改XML。 当前在使用W3C标准的DOM解析方法解析XML文档时,要按照文档中的元素在内存中构建一棵完整的树。但是随着XML文档尺寸的增加,解析所需消耗的内存空间会非常可观并且需过长的处理时间。因此,设法降低解析时内存的占用空间和减少处理时间,对提高系统的解析效率是十分有意义的。 本文针对在DOM解析方法中的不足之处,提出了一种基于DOM的新解析算法——“延迟展开与减少冗余算法”,能较好地解决大量消耗内存的问题,并减少系统冗余,提高了系统的解析效率。 本文的主要工作如下: (1) 依照DOM解析的原理,设计了一个基于DOM解析方式的XML通用解析器,实现了文档的有效性验证、操纵文档树的各种操作、文档树的串行化输出等功能。 (2) 在分析通用的DOM解析方式的基础上,对在文档特别大的时候大量占用内存的缺点提出了一种改进的解析算法——延迟展开与减少冗余算法。使用这种算法能较好的解决大量消耗内存的问题,并减少系统冗
【Abstract】 In the rapid developing times of Internet and E-commerce, extensible markup language (XML), as the new carrier and standard of information exchanging and calculating on the internet, has been widely used. It plays an irreplaceable role in the field of data exchanging. XML parser offers foundation for the information processing of XML. If the information in the XML document is to be adequately used, XML need consequently to be efficiently parsed. The current parsing approaches mainly fall into two types: the DOM (Document Object Model) parsing and the SAX (Simple API for XML) parsing. DOM is a kind of tree-like parsing technique, which constructs an integrated parsing tree in the EMS memory, through which the overall dynamic accessing to the whole XML document can be achieved. SAX is a ’pushing forward’ model used to deal with the XML drive and a lightweight interface, which is not the W3C standard but a widely acknowledged API, and is the standard that most SAX parser has to observe when realizing. Unlike the DOM parser that builds a tree-like chart of the whole document, the SAX parser activates a series of events when it reads the document. These events have been submitted to the event processor that consequently provides accessing to the document contents. However, the SAX parser still has the defects such as it cannot offer sample access to the XML document and cannot support revising the XML where problems exist.Presently, when the DOM parsing approach, basing on the W3C standard, is employed to parse the XML document, an integrated tree has to be built in the EMS memory according to the elements in the document, but with the extending of the XML document size, the consuming memory space necessary for the parsing is remarkable and the processing time is lasting. Therefore, it will be of great significance to reduce the required memory space when parsing and to cut down the processing time.In light of the shortcomings of the DOM parsing, the author proposes a
【Key words】 XML; parser; DOM (document object model); SAX (simple API for XML); hash table;
- 【网络出版投稿人】 山西大学 【网络出版年期】2005年 07期
- 【分类号】TP311.1
- 【被引频次】32
- 【下载频次】1174