The task of re-serializing a parsed node and its contents to XML is tedious. One complication is that in some contexts, it may be desirable to include explicit XML namespace definitions if serializing a subtree when the XML namespace is defined in a containing element. The XmlNode
class supports recursively serializing a node with or without the addition of an explicit namespace. The resulting XML string is XML-equivalent to the string source of the parsed node, with occurrences of whitespace normalized to a single space character.
Attempting to serialize a node with an explicit namespace when the source XML has no namespace declaration is an error.
This XML snippet has no namespace declaration:
<l n="1">Sing, goddess, the rage of <persName n="urn:cite:hmt:pers.pers1">Achilles</persName></l>
If we create an XmlNode
from it, and then serialize the node to an XML string,
we get
<l n="1"> Sing, goddess, the rage of <persName n="urn:cite:hmt:pers.pers1"> Achilles</persName></l>
This fragment of well-formed XML declares as its default namespace the namespace of the Text Encoding Initiative:
<div xmlns="http://www.tei-c.org/ns/1.0"><l n="1">Sing, goddess></l></div>
If we serialize the root element with its default namespace, we unsurprisingly get the XML-equivalent String:
<div xmlns="http://www.tei-c.org/ns/1.0"> <l n="1"> Sing, goddess></l></div>
If we serialize the child l
element separately, the well-formed fragment maintains the default namespace:
<l xmlns="http://www.tei-c.org/ns/1.0" n="1"> Sing, goddess></l>
Because the serialization of a parsed node is XML-equivalent to the source for the parsed node, the output of the collectText
method on any XmlNode
created from the serialization is guaranteed to be identical to the output of collectText
on the original parsed node, so long as any configuration of tokenizing markup is applied to both original and derivative node. (Refer to the specification for collecting text in a parsed tree for details about tokenizing markup.)
Consider a parsed node created from the following well-formed XML fragment:
<l>Sing, <w ana="token">god<unclear>dess</unclear></w></l>
If we collect the text contents of this node, we get
Sing, god dess. If we define the w element as tokenizing markup, collectText
instead gets Sing, goddess, following the specification for collecting text.
If we serialize the node, we get an XML-equivalent string with slightly different white space: <l> Sing, <w ana="token"> god <unclear> dess</unclear></w></l>
If we create a new XmlNode
from this XML-equivalent fragment,
<l> Sing, <w ana="token"> god <unclear> dess</unclear></w></l>
using collectText
produces identical results. That is, the default settings would collect Sing, god dess, but if we define the w element as tokenizing markup, collectText
instead gets Sing, goddess.