The following characters are allowed in the ASCII representation of Greek strings.
To represent the Greek alphabetic characters alpha-omega, the ASCII characters of the TLG beta-code mapping are permitted in either upper- or lower-case form. To represent upper-case Greek characters, the ASCII character is preceded by an asterisk (Unicode 42).
Iota subscript is treated as a distinct character, and is represented by the vertical bar (or "pipe", Unicode 124). Sigma, on the other hand, is a single character: because presentational variants such as terminal or lunate sigma are not characters, they are not represented in Greek strings.
The mapping of ASCII to Unicode Greek transcriptions can be illustrated by creating a GreekString from an ASCII source, and then converting the GreekString to Unicode in NFC form.
Source String | GreekString as Unicode |
---|---|
a | α |
b | β |
g | γ |
d | δ |
e | ε |
z | ζ |
h | η |
q | θ |
i | ι |
k | κ |
l | λ |
m | μ |
n | ν |
c | ξ |
o | ο |
p | π |
r | ρ |
s | ς |
t | τ |
u | υ |
f | φ |
x | χ |
y | ψ |
w | ω |
| | ι |
Source String | GreekString as Unicode |
---|---|
a| | ᾳ |
h| | ῃ |
w| | ῳ |
Source String | GreekString |
---|---|
MH=NIN | mh=nin |
mh=nin | mh=nin |
Mh=nin | mh=nin |
*mh=nin | *mh=nin |
Source String | GreekString as Unicode |
---|---|
MH=NIN | μῆνιν |
mh=nin | μῆνιν |
Mh=nin | μῆνιν |
*mh=nin | Μῆνιν |
Breathings, accents and diaeresis are all treated as distinct characters. Rough and smooth breathing are represented by opening and closing parenthesis characters, respectively (Unicode 28 and 29). Acute, grave and circumflex accents are represented by the solidus (Unicode 47), reverse solidus (Unicode 92), and equals sign (Unicode 61), respectively. The diaeresis is represented by the plus sign (Unicode 43).
Individual breathings, accents and diaeresis follow the vowel over which they are traditionally written.
The grave accent in qea\ is valid. Formatting this as a Unicode string produces θεὰ.
The acute accent in ou)lome/nhn is valid. Formatting this as a Unicode string produces οὐλομένην.
The circumflex accent in mh=nin is valid. Formatting this as a Unicode string produces μῆνιν.
The diaresis character in basilh=i+ is valid. Formatting this as a Unicode string produces βασιλῆϊ.
GreekStrings may include any of the following "white space" characters: space (Unicode 32), tab (Unicode 9), new line (Unicode 10), carriage return (Unicode 13). They are preserved unchanged both in the underlying representation and in conversions to string values.
If we intialize a GreekString from the source string *Mh=nin a)/eide, then converting it back to an ASCII string will preserve the white space:
*mh=nin a)/eide
The following punctuation marks are valid
TBA
Adjacent breathing, accent, and diaeresis are always represented in that sequence.
The sequence breathing, accent, and diaeresis follows lower-case vowels. When the vowel is a diphthong, they follow the second vowel of the diphthong. For upper-case vowels, the sequence follows the asterisk marking upper case, and precedes the vowel character. While this may seem illogical, it simplifies interoperation with legacy ASCII-only encodings of polytonic Greek.
Source String | GreekString as Unicode | Comment |
---|---|---|
a)/eide | ἄειδε | lower case vowel followed by a breathing, then an accent |
ou)lome/nhn | οὐλομένην | diphthong: breathing following second vowel |
e)u+knh/mides | ἐϋκνήμιδες | not a diphthong (due to diaeresis), so breathing follows first vowel |
*)axilh=os | Ἀχιλῆος | upper case vowel with breathing |
*)/enq' | Ἔνθʼ | upper case vowel with breathing and accent |