Thursday, May 30, 2013




XML DOM
The XML DOM defines a standard for accessing and manipulating XML.
What is the DOM?
The DOM is a W3C (World Wide Web Consortium) standard.
The DOM defines a standard for accessing documents like XML and HTML:
"The W3C Document Object Model (DOM) is a platform and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document."
The DOM is separated into 3 different parts / levels:
  • Core DOM - standard model for any structured document
  • XML DOM - standard model for XML documents
  • HTML DOM - standard model for HTML documents
The DOM defines the objects and properties of all document elements, and the methods (interface) to access them.
What is the XML DOM?
The XML DOM is:
  • A standard object model for XML
  • A standard programming interface for XML
  • Platform- and language-independent
  • A W3C standard
The XML DOM defines the objects and properties of all XML elements, and the methods (interface) to access them.
In other words: The XML DOM is a standard for how to get, change, add, or delete XML elements.
DOM Nodes
According to the DOM, everything in an XML document is a node.
The DOM says:
  • The entire document is a document node
  • Every XML element is an element node
  • The text in the XML elements are text nodes
  • Every attribute is an attribute node
  • Comments are comment nodes

Node type
nodeName returns
nodeValue returns
Document
#document
null
DocumentFragment
#document fragment
null
DocumentType
doctype name
null
EntityReference
entity reference name
null
Element
element name
null
Attr
attribute name
attribute value
ProcessingInstruction
target
content of node
Comment
#comment
comment text
Text
#text
content of node
CDATASection
#cdata-section
content of node
Entity
entity name
null
Notation
notation name
null

Element Object Properties
Property
Description
IE
F
O
W3C
Returns a NamedNodeMap of attributes for the element
5
1
9
Yes
Returns the absolute base URI of the element
No
1
No
Yes
Returns a NodeList of child nodes for the element
5
1
9
Yes
Returns the first child of the element
5
1
9
Yes
Returns the last child of the element
5
1
9
Yes
Returns the local part of the name of the element
No
1
9
Yes
Returns the namespace URI of the element
No
1
9
Yes
Returns the node immediately following the element
5
1
9
Yes
Returns the name of the node, depending on its type
5
1
9
Yes
Returns the type of the node
5
1
9
Yes
Returns the root element (document object) for an element
5
1
9
Yes
Returns the parent node of the element
5
1
9
Yes
Sets or returns the namespace prefix of the element
No
1
9
Yes
Returns the node immediately before the element
5
1
9
Yes
schemaTypeInfo
Returns the type information associated with the element


No
Yes
Returns the name of the element
5
1
9
Yes
Sets or returns the text content of the element and its descendants
No
1
No
Yes
Returns the text of the node and its descendants. IE-only property
5
No
No
No
Returns the XML of the node and its descendants. IE-only property
5
No
No
No




DOM Example
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>
  <book category="web" cover="paperback">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

SAX



SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents.SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.
Unlike DOM, there is no formal specification for SAX. The Java implementation of SAX is considered to be normative. It is used for state-independent processing of XML documents, in contrast to StAX that processes the documents state-dependently.

Benefits

SAX parsers have some benefits over DOM-style parsers. A SAX parser only needs to report each parsing event as it happens, and normally discards almost all of that information once reported (it does, however, keep some things, for example a list of all elements that have not been closed yet, in order to catch later errors such as end-tags in the wrong order). Thus, the minimum memory required for a SAX parser is proportional to the maximum depth of the XML file (i.e., of the XML tree) and the maximum data involved in a single XML event (such as the name and attributes of a single start-tag, or the content of a processing instruction, etc).
This much memory is usually considered negligible. A DOM parser, in contrast, typically builds a tree representation of the entire document in memory to begin with, thus using memory that increases with the entire document length. This takes considerable time and space for large documents (memory allocation and data-structure construction take time). The compensating advantage, of course, is that once loaded any part of the document can be accessed in any order.
Because of the event-driven nature of SAX, processing documents is generally far faster than DOM-style parsers, so long as the processing can be done in a start-to-end pass. Many tasks, such as indexing, conversion to other formats, very simple formatting, and the like, can be done that way. Other tasks, such as sorting, rearranging sections, getting from a link to its target, looking up information on one element to help process a later one, and the like, require accessing the document structure in complex orders and will be much faster with DOM than with multiple SAX passes.
Some implementations do not neatly fit either category: a DOM approach can keep its persistent data on disk, cleverly organized for speed (editors such as SoftQuad Author/Editor and large-document browser/indexers such as DynaText do this); while a SAX approach can cleverly cache information for later use (any validating SAX parser keeps more information than described above). Such implementations blur the DOM/SAX tradeoffs, but are often very effective in practice.
Due to the nature of DOM, streamed reading from disk requires techniques such as lazy evaluation, caches, virtual memory, persistent data structures, or other techniques (one such technique is disclosed in [US Patent 5,557,722]). Processing XML documents larger than main memory is sometimes thought impossible because some DOM parsers do not allow it. However, it is no less possible than sorting a datset larger than main memory. disk space as memory to sidestep this limitation. 

Drawbacks

The event-driven model of SAX is useful for XML parsing, but it does have certain drawbacks.
Virtually any kind of XML validation requires access to the document in full. The most trivial example is that an attribute declared in the DTD to be of type IDREF, requires that there be an element in the document that uses the same value for an ID attribute. To validate this in a SAX parser, one must keep track of all ID attributes (any one of them might end up being referenced by an IDREF attribute at the very end); as well as every IDREF attribute until it is resolved. Similarly, to validate that each element has an acceptable sequence of child elements, information about what child elements have been seen for each parent, must be kept until the parent closes.
Additionally, some kinds of XML processing simply require having access to the entire document. XSLT and XPath, for example, need to be able to access any node at any time in the parsed XML tree. Editors and browsers likewise need to be able to display, modify, and perhaps re-validate at any time. While a SAX parser may well be used to construct such a tree initially, SAX provides no help for such processing as a whole.



For soft copy Click Here

0 comments:

Post a Comment