Orthography package, version 1.0.0 >

ASCII representation of Greek strings

The following characters are allowed in the ASCII representation of Greek strings.

Alphabetic characters

To represent the Greek alphabetic characters alpha-omega, the ASCII characters of the TLG beta-code mapping are permitted in either upper- or lower-case form. To represent upper-case Greek characters, the ASCII character is preceded by an asterisk (Unicode 42).

Iota subscript is treated as a distinct character, and is represented by the vertical bar (or "pipe", Unicode 124). Sigma, on the other hand, is a single character: because presentational variants such as terminal or lunate sigma are not characters, they are not represented in Greek strings.

Examples: lower-case characters

The mapping of ASCII to Unicode Greek transcriptions can be illustrated by creating a GreekString from an ASCII source, and then converting the GreekString to Unicode in NFC form.

Source String GreekString as Unicode
aα
bβ
gγ
dδ
eε
zζ
hη
qθ
iι
kκ
lλ
mμ
nν
cξ
oο
pπ
rρ
sς
tτ
uυ
fφ
xχ
yψ
wω
|ι

Examples: iota subscript

Source String GreekString as Unicode
a|
h|
w|

Examples: case insensitivity

Source String GreekString
MH=NINmh=nin
mh=ninmh=nin
Mh=ninmh=nin
*mh=nin*mh=nin
Source String GreekString as Unicode
MH=NINμῆνιν
mh=ninμῆνιν
Mh=ninμῆνιν
*mh=ninΜῆνιν

Breathings, accents and diaeresis

Breathings, accents and diaeresis are all treated as distinct characters. Rough and smooth breathing are represented by opening and closing parenthesis characters, respectively (Unicode 28 and 29). Acute, grave and circumflex accents are represented by the solidus (Unicode 47), reverse solidus (Unicode 92), and equals sign (Unicode 61), respectively. The diaeresis is represented by the plus sign (Unicode 43).

Individual breathings, accents and diaeresis follow the vowel over which they are traditionally written.

Examples

The grave accent in qea\ is valid. Formatting this as a Unicode string produces θεὰ.

The acute accent in ou)lome/nhn is valid. Formatting this as a Unicode string produces οὐλομένην.

The circumflex accent in mh=nin is valid. Formatting this as a Unicode string produces μῆνιν.

The diaresis character in basilh=i+ is valid. Formatting this as a Unicode string produces βασιλῆϊ.

White space

GreekStrings may include any of the following "white space" characters: space (Unicode 32), tab (Unicode 9), new line (Unicode 10), carriage return (Unicode 13). They are preserved unchanged both in the underlying representation and in conversions to string values.

Example

If we intialize a GreekString from the source string *Mh=nin a)/eide, then converting it back to an ASCII string will preserve the white space:

*mh=nin a)/eide

Punctuation

The following punctuation marks are valid

Examples: punctuation

TBA

Sequences of characters

Adjacent breathing, accent, and diaeresis are always represented in that sequence.

The sequence breathing, accent, and diaeresis follows lower-case vowels. When the vowel is a diphthong, they follow the second vowel of the diphthong. For upper-case vowels, the sequence follows the asterisk marking upper case, and precedes the vowel character. While this may seem illogical, it simplifies interoperation with legacy ASCII-only encodings of polytonic Greek.

Examples: sequences of breathing, accent and diaeresis

Source String GreekString as Unicode Comment
a)/eide ἄειδε lower case vowel followed by a breathing, then an accent
ou)lome/nhn οὐλομένην diphthong: breathing following second vowel
e)u+knh/mides ἐϋκνήμιδες not a diphthong (due to diaeresis), so breathing follows first vowel
*)axilh=os Ἀχιλῆος upper case vowel with breathing
*)/enq' Ἔνθʼ upper case vowel with breathing and accent