LatinOrthography.jl: documentation
Implementations of the HCMID OrthographicSystem interface for Latin texts
Latin23
Latin23 is an orthography for Latin texts with 23 alphabetic characters — that is, texts with a single character for vocalic/consonantal i/j and a single character for vocalic/consonantal u/v. The function latin23 creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem.
using LatinOrthography
ortho = latin23()
typeof(ortho) |> supertypeLatinOrthography.LatinOrthographicSystemValid characters and token types
The Latin23 type implements the basic functions of the OrthographicSystem interface:
codepoints: returns a complete list of codepoints allowed in this orthographytokentypes: enumeration of the types of tokens recognized in this orthography
codepoints(ortho)"abcdefghiklmnopqrstuxyzABCDEFGHIKLMNOPQRSTUXYZ.,;:? \n\t+"tokentypes(ortho)3-element Vector{DataType}:
Orthography.LexicalToken
Orthography.PunctuationToken
LatinOrthography.EncliticTokenThese give us (for free!) implementations of the OrthographicSystem's validcp and validstring functions.
using Orthography
validcp("a", ortho)truevalidcp("β", ortho)falsevalidstring( "Nunc est bibendum.", ortho)truevalidstring( "μῆνιν ἄειδε", ortho)falseTokenizing a string
The tokenize function returns an array of OrthographicTokens, each of which has a string value and a token type from the set of token types possible for this orthography.
tkns = tokenize("Nunc est bibendum.", ortho)4-element Vector{Orthography.OrthographicToken}:
Orthography.OrthographicToken("Nunc", Orthography.LexicalToken())
Orthography.OrthographicToken("est", Orthography.LexicalToken())
Orthography.OrthographicToken("bibendum", Orthography.LexicalToken())
Orthography.OrthographicToken(".", Orthography.PunctuationToken())tkns[1].text"Nunc"tkns[1].tokencategoryOrthography.LexicalToken()tkns[4].text"."tkns[4].tokencategoryOrthography.PunctuationToken()Latin24
Latin24 is an orthography for Latin texts with 24 alphabetic characters — that is, texts with a single character for vocalic/consonantal i/j but distinguishing consonantal v from vocalic u. The function latin24 creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem.
ortho24 = latin24()
validcp("v", ortho24)truevalidcp("j", ortho24)falseLatin25
Latin25 is an orthography for Latin texts with 25 alphabetic characters. It distinguishes vocalic/consonantal i and u from consonantal j and v. The function latin25 creates an instance of this orthography. It is a subtype of the HCMID OrthographicSystem.
ortho25 = latin25()
validcp("v", ortho25)truevalidcp("j", ortho25)true