Neel Smith on github Openly available work in digital classics

Serializing parsed XML trees in xmlutils 1.2

It is delightfully easy to work with parsed XML in groovy; it’s much less easy to write a parsed node as an XML string. Version 1.2 of the groovy xmlutils library adds a toXml method to handle this common task.

Consider a well-formed fragment of a TEI document:

String ozymandias = """
<div xmlns='http://www.tei-c.org/ns/1.0'>
<l n="1">I met a Traveler from an antique land,</l>
<l n="2">Who said, "Two vast and trunkless legs of stone</l>
<l n="3">Stand in the desert...</l>
</div>
"""

Parsing it in groovy is a one-liner:

Node root = new XmlParser().parseText(ozymandias)

If you wanted to print out the line numbers (given in the n attribute of each l element), it’s as simple as

root.children().each { childNode ->
  println childNode.'@n'
}

As you would expect, this produces

1
2
3

The toString() method does not help if you want to write a node as an XML string. Try this

root.children().each { childNode ->
  println childNode.'@n' + ": " + childNode.toString()
}

and you’ll see:

1: {http://www.tei-c.org/ns/1.0}l[attributes={n=1}; value=[I met a Traveler from an antique land,]]
2: {http://www.tei-c.org/ns/1.0}l[attributes={n=2}; value=[Who said, "Two vast and trunkless legs of stone]]
3: {http://www.tei-c.org/ns/1.0}l[attributes={n=3}; value=[Stand in the desert...]]

Use an XmlNode from the xmlutils library for this instead.

root.children().each { childNode ->
  if (childNode.'@n' == "1") {
    println new XmlNode(childNode).toXml()
  }
}

This produces

<l n="1"> I met a Traveler from an antique land,</l>

This is fine if your context is a TEI document, but what if you want to embed this in a document with multiple namespaces? You can include a boolean parameter indicating whether or not to add an explicit namespace declaration on the node (true means “include the declaration”).

print '<foreign xmlns="http://nee
lsmith.github.io/troublesome/namespace">'
root.children().each { childNode ->
  if (childNode.'@n' == "1") {
    println new XmlNode(childNode).toXml(true)
  }
}
println "</foreign>"

This gives you a well-formed fragment with content in two namespaces:

<foreign xmlns="http://neelsmith.github.io/troublesome/namespace">
<l xmlns="http://www.tei-c.org/ns/1.0"  n="1"> I met a Traveler from an antique land,</l>
</foreign>

It’s interesting to note that since the collectText method of XmlNode (desribed in this post) guarantees identical output for XML-equivalent nodes (disregarding whitespace differences), the following string comparison is true for any instance n of the XmlNode class:

String xmlString = n.toXml()
XmlNode derivative = new XmlNode(xmlString)
assert n.collectText() == derivative.collectText()