Data models

1. The `LatinParsedToken`

The foundational unit for all other structures in the latincorpus library is the LatinParsedToken.

A LatinParsedToken is a categorized token, with an associated list of morphological analyses. The LatinParsedToken is citable by a CTS URN that extends the canonical citation scheme for the text by one level to create a canonical citation for individual tokens.

See more about the LatinParsedToken

2. The `LatinParsedTokenSequence` trait

The LatinParsedTokenSequence trait defines behaviors for an ordered series of LatinParsedTokens.

The following implementations are included in the latin-corpus library:

LatinCorpus. This is the default implementation of the LatinParsedTokenSequence. It views a collection of tokens as a single sequence of LatinParsedTokens.
LatinCitableUnit. This represents a sequence of parsed tokens belonging to a single canonically citable unit of text.
LatinEdition. This represents a sequence of tokens belonging to a single version of a single text.
LatinSentence. This represents a sequence of tokens belonging to a single sentence.
LatinNGram. This represents a single n-gram extracted from a longer sequence of parsed tokens.

See more about the LatinParsedTokenSequence and its implementations.

3. The `ParsedSequenceCollection`

A ParsedSequenceCollection lets you work with a collection of LatinParsedTokenSequences, such as a collection of sentences, citable nodes, or even a collection of whole corpora.

See more about the ParsedSequenceCollection.

Data models

1. The LatinParsedToken

2. The LatinParsedTokenSequence trait

3. The ParsedSequenceCollection

Table of contents

1. The `LatinParsedToken`

2. The `LatinParsedTokenSequence` trait

3. The `ParsedSequenceCollection`