The best software for Classics you've never heard of?
17 Oct 2015I first learned about Harry Schmidt’s Parsley a little over a year ago, when, in preparing for work on a new Greek morphological parser, I was actively searching for examples of parsers for other natural languages. If you know the Parsley end-user application, you’re probably already a fan: it’s a well conceived program for parsing Latin morphology, and beautifully does exactly what it intends to do. But head over to the github repository for the parser to see why I think it’s one of the most impressive pieces of work I’ve found in the world of digital classics.
Distinct tasks are neatly separated, with appropriate technologies chosen for each; the whole project is managed by
a rake build system, including a thorough suite of tests. No reinventing wheels: code libraries and data sets are chosen from the best available options. (Crucial examples are the Stuttgart FST toolkit for writing the core parser, and the
Latin morphological data from the Perseus project,
here very tidily reformatted. Shout out once again to Perseus for significant
work built on its openly licensed data.) No overengineering, either: Schmidt provides his own FST implementations
in Go and C/Objective C, but his rake build just calls a sh
process to compile the FST with the
Stuttgart tools.
Data are loaded from legible simple text files, so both code components and data sets can easily be reused or extended. The core parser is a finite state transducer (FST) written in the SFST-PL notation. A FST can be understood as a graph, and Schmidt gives you the compiled transducers in three equivalent graph formats. This serves a pragmatic end: it simplifies working with the transducer using a wide range of tools. But it also implicitly illustrates a more general kind of conceptual advance, since it replaces a complex algorithmic process with a stable declarative statement. (That’s a theme that deserves extended treatment, and I plan to return to in later blog posts.)
The only thing I don’t understand is why Parsley is not being widely discussed and used in the very active DH community in Classics. Studying it has enormously accelerated my initial work this fall on a Greek parser, and I’m excited to be using it directly with students in the Holy Cross Manuscripts, Inscriptions and Documents Club developing an automated validation system for their editorial work on Latin manuscripts.
I don’t want to end an enthusiastic post on a negative note, so rather than dwell on why graduate study in Classics drives away brilliant people like Harry Schmidt, I’ll simply add two observations. Our society is certainly benefiting from the fact that he is giving his exceptional talents to the important work at CaseText, and classicists can be glad that Parsley’s github repository is openly available. Clone it and learn!