Link Search Menu Expand Document

Data models

1. The LatinParsedToken

The foundational unit for all other structures in the latincorpus library is the LatinParsedToken.

A LatinParsedToken is a categorized token, with an associated list of morphological analyses. The LatinParsedToken is citable by a CTS URN that extends the canonical citation scheme for the text by one level to create a canonical citation for individual tokens.

See more about the LatinParsedToken

2. The LatinParsedTokenSequence trait

The LatinParsedTokenSequence trait defines behaviors for an ordered series of LatinParsedTokens.

The following implementations are included in the latin-corpus library:

  • LatinCorpus. This is the default implementation of the LatinParsedTokenSequence. It views a collection of tokens as a single sequence of LatinParsedTokens.
  • LatinCitableUnit. This represents a sequence of parsed tokens belonging to a single canonically citable unit of text.
  • LatinEdition. This represents a sequence of tokens belonging to a single version of a single text.
  • LatinSentence. This represents a sequence of tokens belonging to a single sentence.
  • LatinNGram. This represents a single n-gram extracted from a longer sequence of parsed tokens.

See more about the LatinParsedTokenSequence and its implementations.

3. The ParsedSequenceCollection

A ParsedSequenceCollection lets you work with a collection of LatinParsedTokenSequences, such as a collection of sentences, citable nodes, or evenĀ a collection of whole corpora.

See more about the ParsedSequenceCollection.


Table of contents