Serializing parsed XML trees in xmlutils 1.2
24 Nov 2015It is delightfully easy to work with parsed XML in groovy; it’s much less easy to write a parsed node as an XML string. Version 1.2 of the groovy xmlutils library adds a toXml
method to handle this common task.
Consider a well-formed fragment of a TEI document:
String ozymandias = """
<div xmlns='http://www.tei-c.org/ns/1.0'>
<l n="1">I met a Traveler from an antique land,</l>
<l n="2">Who said, "Two vast and trunkless legs of stone</l>
<l n="3">Stand in the desert...</l>
</div>
"""
Parsing it in groovy is a one-liner:
Node root = new XmlParser().parseText(ozymandias)
If you wanted to print out the line numbers (given in the n
attribute of each l
element), it’s as simple as
root.children().each { childNode ->
println childNode.'@n'
}
As you would expect, this produces
1
2
3
The toString()
method does not help if you want to write a node as an XML string. Try this
root.children().each { childNode ->
println childNode.'@n' + ": " + childNode.toString()
}
and you’ll see:
1: {http://www.tei-c.org/ns/1.0}l[attributes={n=1}; value=[I met a Traveler from an antique land,]]
2: {http://www.tei-c.org/ns/1.0}l[attributes={n=2}; value=[Who said, "Two vast and trunkless legs of stone]]
3: {http://www.tei-c.org/ns/1.0}l[attributes={n=3}; value=[Stand in the desert...]]
Use an XmlNode
from the xmlutils
library for this instead.
root.children().each { childNode ->
if (childNode.'@n' == "1") {
println new XmlNode(childNode).toXml()
}
}
This produces
<l n="1"> I met a Traveler from an antique land,</l>
This is fine if your context is a TEI document, but what if you want to embed this in a document with multiple namespaces? You can include a boolean parameter indicating whether or not to add an explicit namespace declaration on the node (true
means “include the declaration”).
print '<foreign xmlns="http://nee
lsmith.github.io/troublesome/namespace">'
root.children().each { childNode ->
if (childNode.'@n' == "1") {
println new XmlNode(childNode).toXml(true)
}
}
println "</foreign>"
This gives you a well-formed fragment with content in two namespaces:
<foreign xmlns="http://neelsmith.github.io/troublesome/namespace">
<l xmlns="http://www.tei-c.org/ns/1.0" n="1"> I met a Traveler from an antique land,</l>
</foreign>
It’s interesting to note that since the collectText
method of XmlNode
(desribed in this post) guarantees identical output for XML-equivalent nodes (disregarding whitespace differences), the following string comparison is true for any instance n
of the XmlNode
class:
String xmlString = n.toXml()
XmlNode derivative = new XmlNode(xmlString)
assert n.collectText() == derivative.collectText()