Reading an annotated text from a delimited-text source
The readdelimited function takes a Vector of delimited-text strings and parses them into annotations on sentences, verbal expressions, and individual tokens. (See the page on delimited-text format for details of its structure.)
The test/data directory of this repository has a test file with syntactic annotations on sentences from Lysias 1.
f = joinpath(root, "test", "data", "Lysias1.6ff.cex")"/home/runner/work/GreekSyntax.jl/GreekSyntax.jl/test/data/Lysias1.6ff.cex"You can read it with the standard Julia function readlines, and pass this directly to readdelimited. The result is a tuple with three vectors respectively containing annotations for sentences, verbal expressions and individual tokens.
using GreekSyntax
(sentences, verbalunits, tokens) = readlines(f) |> readdelimited
length(sentences)3length(verbalunits)16length(tokens)87It is equally easy to retrieve a source from a URL. Here is a set of annottions from the eagl-texts repository:
url = "https://raw.githubusercontent.com/neelsmith/eagl-texts/main/annotations/Lysias1_annotations.cex"
using Downloads
(remote_sentences, remote_verbalunits, remote_tokens) = Downloads.download(url) |> readlines |> readdelimited
length(remote_sentences)64length(remote_verbalunits)268length(remote_tokens)1214