Data models
1. The LatinParsedToken
The foundational unit for all other structures in the latincorpus library is the LatinParsedToken.
A LatinParsedToken is a categorized token, with an associated list of morphological analyses. The LatinParsedToken is citable by a CTS URN that extends the canonical citation scheme for the text by one level to create a canonical citation for individual tokens.
See more about the LatinParsedToken
2. The LatinParsedTokenSequence trait
The LatinParsedTokenSequence trait defines behaviors for an ordered series of LatinParsedTokens.
The following implementations are included in the latin-corpus library:
LatinCorpus. This is the default implementation of theLatinParsedTokenSequence. It views a collection of tokens as a single sequence ofLatinParsedTokens.LatinCitableUnit. This represents a sequence of parsed tokens belonging to a single canonically citable unit of text.LatinEdition. This represents a sequence of tokens belonging to a single version of a single text.LatinSentence. This represents a sequence of tokens belonging to a single sentence.LatinNGram. This represents a single n-gram extracted from a longer sequence of parsed tokens.
See more about the LatinParsedTokenSequence and its implementations.
3. The ParsedSequenceCollection
A ParsedSequenceCollection lets you work with a collection of LatinParsedTokenSequences, such as a collection of sentences, citable nodes, or evenĀ a collection of whole corpora.
See more about the ParsedSequenceCollection.