Reading an annotated text from a delimited-text source
The readdelimited
function takes a Vector of delimited-text strings and parses them into annotations on sentences, verbal expressions, and individual tokens. (See the page on delimited-text format for details of its structure.)
The test/data
directory of this repository has a test file with syntactic annotations on sentences from Lysias 1.
f = joinpath(root, "test", "data", "Lysias1.6ff.cex")
"/home/runner/work/GreekSyntax.jl/GreekSyntax.jl/test/data/Lysias1.6ff.cex"
You can read it with the standard Julia function readlines
, and pass this directly to readdelimited
. The result is a tuple with three vectors respectively containing annotations for sentences, verbal expressions and individual tokens.
using GreekSyntax
(sentences, verbalunits, tokens) = readlines(f) |> readdelimited
length(sentences)
3
length(verbalunits)
16
length(tokens)
87
It is equally easy to retrieve a source from a URL. Here is a set of annottions from the eagl-texts
repository:
url = "https://raw.githubusercontent.com/neelsmith/eagl-texts/main/annotations/Lysias1_annotations.cex"
using Downloads
(remote_sentences, remote_verbalunits, remote_tokens) = Downloads.download(url) |> readlines |> readdelimited
length(remote_sentences)
64
length(remote_verbalunits)
268
length(remote_tokens)
1214