Reading an annotated text from a delimited-text source

The readdelimited function takes a Vector of delimited-text strings and parses them into annotations on sentences, verbal expressions, and individual tokens. (See the page on delimited-text format for details of its structure.)

The test/data directory of this repository has a test file with syntactic annotations on sentences from Lysias 1.

f = joinpath(root, "test", "data", "Lysias1.6ff.cex")
"/home/runner/work/GreekSyntax.jl/GreekSyntax.jl/test/data/Lysias1.6ff.cex"

You can read it with the standard Julia function readlines, and pass this directly to readdelimited. The result is a tuple with three vectors respectively containing annotations for sentences, verbal expressions and individual tokens.

using GreekSyntax
(sentences, verbalunits, tokens) = readlines(f) |> readdelimited
length(sentences)
3
length(verbalunits)
16
length(tokens)
87

It is equally easy to retrieve a source from a URL. Here is a set of annottions from the eagl-texts repository:

url = "https://raw.githubusercontent.com/neelsmith/eagl-texts/main/annotations/Lysias1_annotations.cex"

using Downloads
(remote_sentences, remote_verbalunits, remote_tokens) = Downloads.download(url) |> readlines |> readdelimited
length(remote_sentences)
64
length(remote_verbalunits)
268
length(remote_tokens)
1214