LatinOrthography.jl
: documentation
Implementations of the HCMID OrthographicSystem
interface for Latin texts
Latin23
Latin23
is an orthography for Latin texts with 23 alphabetic characters — that is, texts with a single character for vocalic/consonantal i/j
and a single character for vocalic/consonantal u/v
. The function latin23
creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem
.
using LatinOrthography
ortho = latin23()
typeof(ortho) |> supertype
LatinOrthography.LatinOrthographicSystem
Valid characters and token types
The Latin23
type implements the basic functions of the OrthographicSystem
interface:
codepoints
: returns a complete list of codepoints allowed in this orthographytokentypes
: enumeration of the types of tokens recognized in this orthography
codepoints(ortho)
"abcdefghiklmnopqrstuxyzABCDEFGHIKLMNOPQRSTUXYZ.,;:? \n\t+"
tokentypes(ortho)
3-element Vector{DataType}:
Orthography.LexicalToken
Orthography.PunctuationToken
LatinOrthography.EncliticToken
These give us (for free!) implementations of the OrthographicSystem
's validcp
and validstring
functions.
using Orthography
validcp("a", ortho)
true
validcp("β", ortho)
false
validstring( "Nunc est bibendum.", ortho)
true
validstring( "μῆνιν ἄειδε", ortho)
false
Tokenizing a string
The tokenize
function returns an array of OrthographicTokens
, each of which has a string value and a token type from the set of token types possible for this orthography.
tkns = tokenize("Nunc est bibendum.", ortho)
4-element Vector{Orthography.OrthographicToken}:
Orthography.OrthographicToken("Nunc", Orthography.LexicalToken())
Orthography.OrthographicToken("est", Orthography.LexicalToken())
Orthography.OrthographicToken("bibendum", Orthography.LexicalToken())
Orthography.OrthographicToken(".", Orthography.PunctuationToken())
tkns[1].text
"Nunc"
tkns[1].tokencategory
Orthography.LexicalToken()
tkns[4].text
"."
tkns[4].tokencategory
Orthography.PunctuationToken()
Latin24
Latin24
is an orthography for Latin texts with 24 alphabetic characters — that is, texts with a single character for vocalic/consonantal i/j
but distinguishing consonantal v
from vocalic u
. The function latin24
creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem
.
ortho24 = latin24()
validcp("v", ortho24)
true
validcp("j", ortho24)
false
Latin25
Latin25
is an orthography for Latin texts with 25 alphabetic characters. It distinguishes vocalic/consonantal i
and u
from consonantal j
and v
. The function latin25
creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem
.
ortho25 = latin25()
validcp("v", ortho25)
true
validcp("j", ortho25)
true