Orthography package, version 1.0.0 >

Lexical tokens (Greek words)

Lexical tokens are composed only of valid Greek characters as specified for Greek strings, except white-space characters. Lexical tokens may be created directly from strings, and converted back to strings, following the same rules as for Greek strings.

Examples

Create a lexical token from a string, and convert it back to a string:

Source string Token as a String
MH=NINmh=nin
mh=ninmh=nin
Mh=ninmh=nin
*mh=nin*mh=nin

Tokenization

A Greek string may comprise more than one lexical token. White space characters that are permitted in GreekStrings but not in Greek words delimit tokens within a String. The GreekString class can create a list of Greek Words from a Greek String.

Tokenizing the following Unicode string

Ζεὺς δ' Ἔριδα προΐαλλε θοὰς ἐπὶ νῆας Ἀχαιῶν

yields this ordered list of GreekWords:

Verb (with abbreviated prefix)
*zeu\s
d'
*)/erida
proi+/alle
qoa\s
e)pi\
nh=as
*)axaiw=n

Like GreekStrings, GreekWords can be represented as Unicode strings in NFC form.

Source string As Unicode
*zeu\sΖεὺς
d'δʼ
*)/eridaἜριδα
proi+/alleπροΐαλλε
qoa\sθοὰς
e)pi\ἐπὶ
nh=asνῆας
*)axaiw=nἈχαιῶν