Neel Smith on github Openly available work in digital classics

Convert XML texts from beta code to Unicode

Classicists have been active digitally since before the first Unicode standard, and long-standing projects like Perseus have a legacy of XML texts encoding Greek in the ASCII convention called “beta code.” Converting the encoding of the text content of an XML document while leaving the XML markup untouched can be a headache, so I’ve pushed a gist here that reduces that to a single command. It’s written in groovy and uses its grape dependency manager to download all supporting libraries. The converted XML is written to standard output, so in a POSIXy operating system you can create a UTF-8 version of your beta code XML text with

groovy beta2Utf8Xml.groovy BETACODEFILE.xml > UTF8FILE.xml