The LatinSyntax
package
Here's a tiny citable corpus named twopassages
with two passages of Hyginus.
typeof(twopassages)
CitableCorpus.CitableTextCorpus
map(psg -> psg.text, twopassages.passages)
2-element Vector{String}:
"Achiui cum per decem annos Troi" ⋯ 236 bytes ⋯ "astraque transtulerunt Tenedo."
"id Troiani cum uiderunt arbitra" ⋯ 160 bytes ⋯ "stes, fides ei habita non est."
Let's parse it into sentences using the Latin23
orthography.
using LatinSyntax
using LatinOrthography
parsesentences(twopassages, latin23())
5-element Vector{Any}:
(urn = urn:cts:latinLit:stoa1263.stoa001.hc_tokens:108a.1.1-108a.1.28a, sequence = 1)
(urn = urn:cts:latinLit:stoa1263.stoa001.hc_tokens:108a.1.29-108a.1.39a, sequence = 2)
(urn = urn:cts:latinLit:stoa1263.stoa001.hc_tokens:108a.2.1-108a.2.8a, sequence = 3)
(urn = urn:cts:latinLit:stoa1263.stoa001.hc_tokens:108a.2.9-108a.2.20a, sequence = 4)
(urn = urn:cts:latinLit:stoa1263.stoa001.hc_tokens:108a.2.21-108a.2.32a, sequence = 5)