XML DOM
The XML DOM defines a standard for accessing and manipulating XML.
What is the DOM?
The DOM is a W3C (World Wide Web Consortium) standard.
The DOM defines a standard for accessing documents like XML and HTML:
"The W3C Document Object Model (DOM) is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document."
The DOM is separated into 3 different parts / levels:
- Core DOM - standard model for any structured document
- XML DOM - standard model for XML documents
- HTML DOM - standard model for HTML documents
The DOM defines the objects and properties of all document elements, and the methods (interface) to access them.
What is the XML DOM?
The XML DOM is:
- A standard object model for XML
- A standard programming interface for XML
- Platform- and language-independent
- A W3C standard
The XML DOM defines the objects and properties of all XML elements, and the methods (interface) to access them.
In other words: The XML DOM is a standard for how to get, change, add, or delete XML elements.
DOM Nodes
According to the DOM, everything in an XML document is a node.
The DOM says:
- The entire document is a document node
- Every XML element is an element node
- The text in the XML elements are text nodes
- Every attribute is an attribute node
- Comments are comment nodes
Node type
|
nodeName returns
|
nodeValue returns
|
Document
|
#document
|
null
|
DocumentFragment
|
#document fragment
|
null
|
DocumentType
|
doctype name
|
null
|
EntityReference
|
entity reference name
|
null
|
Element
|
element name
|
null
|
Attr
|
attribute name
|
attribute value
|
ProcessingInstruction
|
target
|
content of node
|
Comment
|
#comment
|
comment text
|
Text
|
#text
|
content of node
|
CDATASection
|
#cdata-section
|
content of node
|
Entity
|
entity name
|
null
|
Notation
|
notation name
|
null
|
Element Object Properties
Property
|
Description
|
IE
|
F
|
O
|
W3C
|
Returns a NamedNodeMap of attributes for the element
|
5
|
1
|
9
|
Yes
| |
Returns the absolute base URI of the element
|
No
|
1
|
No
|
Yes
| |
Returns a NodeList of child nodes for the element
|
5
|
1
|
9
|
Yes
| |
Returns the first child of the element
|
5
|
1
|
9
|
Yes
| |
Returns the last child of the element
|
5
|
1
|
9
|
Yes
| |
Returns the local part of the name of the element
|
No
|
1
|
9
|
Yes
| |
Returns the namespace URI of the element
|
No
|
1
|
9
|
Yes
| |
Returns the node immediately following the element
|
5
|
1
|
9
|
Yes
| |
Returns the name of the node, depending on its type
|
5
|
1
|
9
|
Yes
| |
Returns the type of the node
|
5
|
1
|
9
|
Yes
| |
Returns the root element (document object) for an element
|
5
|
1
|
9
|
Yes
| |
Returns the parent node of the element
|
5
|
1
|
9
|
Yes
| |
Sets or returns the namespace prefix of the element
|
No
|
1
|
9
|
Yes
| |
Returns the node immediately before the element
|
5
|
1
|
9
|
Yes
| |
schemaTypeInfo
|
Returns the type information associated with the element
|
No
|
Yes
| ||
Returns the name of the element
|
5
|
1
|
9
|
Yes
| |
Sets or returns the text content of the element and its descendants
|
No
|
1
|
No
|
Yes
| |
Returns the text of the node and its descendants. IE-only property
|
5
|
No
|
No
|
No
| |
Returns the XML of the node and its descendants. IE-only property
|
5
|
No
|
No
|
No
|
DOM Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
SAX
SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents.SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.
Unlike DOM, there is no formal specification for SAX. The Java implementation of SAX is considered to be normative. It is used for state-independent processing of XML documents, in contrast to StAX that processes the documents state-dependently.
Benefits
SAX parsers have some benefits over DOM-style parsers. A SAX parser only needs to report each parsing event as it happens, and normally discards almost all of that information once reported (it does, however, keep some things, for example a list of all elements that have not been closed yet, in order to catch later errors such as end-tags in the wrong order). Thus, the minimum memory required for a SAX parser is proportional to the maximum depth of the XML file (i.e., of the XML tree) and the maximum data involved in a single XML event (such as the name and attributes of a single start-tag, or the content of a processing instruction, etc).
This much memory is usually considered negligible. A DOM parser, in contrast, typically builds a tree representation of the entire document in memory to begin with, thus using memory that increases with the entire document length. This takes considerable time and space for large documents (memory allocation and data-structure construction take time). The compensating advantage, of course, is that once loaded any part of the document can be accessed in any order.
Because of the event-driven nature of SAX, processing documents is generally far faster than DOM-style parsers, so long as the processing can be done in a start-to-end pass. Many tasks, such as indexing, conversion to other formats, very simple formatting, and the like, can be done that way. Other tasks, such as sorting, rearranging sections, getting from a link to its target, looking up information on one element to help process a later one, and the like, require accessing the document structure in complex orders and will be much faster with DOM than with multiple SAX passes.
Some implementations do not neatly fit either category: a DOM approach can keep its persistent data on disk, cleverly organized for speed (editors such as SoftQuad Author/Editor and large-document browser/indexers such as DynaText do this); while a SAX approach can cleverly cache information for later use (any validating SAX parser keeps more information than described above). Such implementations blur the DOM/SAX tradeoffs, but are often very effective in practice.
Due to the nature of DOM, streamed reading from disk requires techniques such as lazy evaluation, caches, virtual memory, persistent data structures, or other techniques (one such technique is disclosed in [US Patent 5,557,722]). Processing XML documents larger than main memory is sometimes thought impossible because some DOM parsers do not allow it. However, it is no less possible than sorting a datset larger than main memory. disk space as memory to sidestep this limitation.
Drawbacks
The event-driven model of SAX is useful for XML parsing, but it does have certain drawbacks.
Virtually any kind of XML validation requires access to the document in full. The most trivial example is that an attribute declared in the DTD to be of type IDREF, requires that there be an element in the document that uses the same value for an ID attribute. To validate this in a SAX parser, one must keep track of all ID attributes (any one of them might end up being referenced by an IDREF attribute at the very end); as well as every IDREF attribute until it is resolved. Similarly, to validate that each element has an acceptable sequence of child elements, information about what child elements have been seen for each parent, must be kept until the parent closes.
Additionally, some kinds of XML processing simply require having access to the entire document. XSLT and XPath, for example, need to be able to access any node at any time in the parsed XML tree. Editors and browsers likewise need to be able to display, modify, and perhaps re-validate at any time. While a SAX parser may well be used to construct such a tree initially, SAX provides no help for such processing as a whole.
For soft copy Click Here
0 comments:
Post a Comment