LatinOrthography.jl: documentation

Implementations of the HCMID OrthographicSystem interface for Latin texts

Latin23

Latin23 is an orthography for Latin texts with 23 alphabetic characters — that is, texts with a single character for vocalic/consonantal i/j and a single character for vocalic/consonantal u/v. The function latin23 creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem.

using LatinOrthography
ortho  =  latin23()
typeof(ortho) |> supertype
LatinOrthography.LatinOrthographicSystem

Valid characters and token types

The Latin23 type implements the basic functions of the OrthographicSystem interface:

  • codepoints: returns a complete list of codepoints allowed in this orthography
  • tokentypes: enumeration of the types of tokens recognized in this orthography
codepoints(ortho)
"abcdefghiklmnopqrstuxyzABCDEFGHIKLMNOPQRSTUXYZ.,;:? \n\t+"
tokentypes(ortho)
3-element Vector{DataType}:
 Orthography.LexicalToken
 Orthography.PunctuationToken
 LatinOrthography.EncliticToken

These give us (for free!) implementations of the OrthographicSystem's validcp and validstring functions.

using Orthography
validcp("a", ortho)
true
validcp("β", ortho)
false
validstring( "Nunc est bibendum.", ortho)
true
validstring( "μῆνιν ἄειδε", ortho)
false

Tokenizing a string

The tokenize function returns an array of OrthographicTokens, each of which has a string value and a token type from the set of token types possible for this orthography.

tkns = tokenize("Nunc est bibendum.", ortho)
4-element Vector{Orthography.OrthographicToken}:
 Orthography.OrthographicToken("Nunc", Orthography.LexicalToken())
 Orthography.OrthographicToken("est", Orthography.LexicalToken())
 Orthography.OrthographicToken("bibendum", Orthography.LexicalToken())
 Orthography.OrthographicToken(".", Orthography.PunctuationToken())
tkns[1].text
"Nunc"
tkns[1].tokencategory
Orthography.LexicalToken()
tkns[4].text
"."
tkns[4].tokencategory
Orthography.PunctuationToken()

Latin24

Latin24 is an orthography for Latin texts with 24 alphabetic characters — that is, texts with a single character for vocalic/consonantal i/j but distinguishing consonantal v from vocalic u. The function latin24 creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem.

ortho24  =  latin24()
validcp("v", ortho24)
true
validcp("j", ortho24)
false

Latin25

Latin25 is an orthography for Latin texts with 25 alphabetic characters. It distinguishes vocalic/consonantal i and u from consonantal j and v. The function latin25 creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem.

ortho25  =  latin25()
validcp("v", ortho25)
true
validcp("j", ortho25)
true