Simple text exchange with 82XF, version 2
13 Jul 2016On Monday, I suggested that a convention for a simple delimited text format could capture the complete OHCO2 model of texts, and greatly simplify the exchange of citable text data. DH2016 was a great place to offer the idea: Chris Blackwell and Thomas Köntges have already suggested a substantial (and in retrospect, obvious) improvement.
The semantics of the five columns would remain essentially the same, namely:
- identifier for the node
- identifier for the previous node
- sequence number of node
- identifier for the following node
- text content of node
The difference from Monday’s proposal is that in columns 2 and 4, preceding and succeeding nodes should of course be identified by URN, not referred to with sequence numbers. Given those identifiers, it is true that you don’t absolutely need a sequence number, but it is still a useful value to have precomputed and stored with the record. Among other things, a text file could then be sorted easily into document order.
So using URNs to refer to all nodes in the convention, the example I used Monday might instead look like this (with #
serving as the delimiting character):
Urn#PrevUrn#SequenceIndex#NextUrn#TextContent
urn:cts:greekLit:tlg0012.tlg001.hmt01:10.4#urn:cts:greekLit:tlg0012.tlg001.hmt01:10.3#4#urn:cts:greekLit:tlg0012.tlg001.hmt01:10.5#Sweet sleep did not hold him, as he pondered many things in his mind.
Thomas suggests the name 82XF as an abbreviation for “OHCO2 eXchange Format”. (The etymology perhaps deserves its own blog post.) This would bring us to version 2.0 of the convention, and while it breaks compatibility with Monday’s suggestion, we hope not too many implementations have been completed in the past 36 hours!