A library for working with Greek in the pre-403 BCE Attic alphabet > Orthography of Attic Greek >

Representing strings of characters

Editorial characters

In addition to transcribing the alphabetic and punctuation characters used in Attic inscriptions (specified here), editions of Attic inscriptions conventionally include additional characters helping to disambiguate:

Tokenizing characters

Archaic and classical Attic inscriptions do not normally indicate word divisions. Any of the following white-space characters may be included in a string of Attic Greek in either ASCII or Greek Unicode ranges to indicate breaks between syntactic tokens.

Tokenizing character Code point (decimal) Valid character?
Space32true
Tab9true
Line feed10true
Form feed12true
Carriage return13true

In addition, the apostrophe character (decimal 39) may be used to indicate elision at the end of a word token.

Examples of tokenizing characters

The string EDOXSEN TEI BOLEI includes white space tokenizing the text. It is a valid Attic string in the ASCII mapping.

The string εδοχσεν τει βολει includes white space tokenizing the text. It is a valid Attic string in the Greek range of Unicode.

The string TAUTA D'ENAI uses both white space and an elision character to tokenize text. It is a valid Attic string.

The string ταυτα δ'εναι uses both white space and an elision character to tokenize text. It is a valid Attic string in the Greek range of Unicode.

Note in particular the semantics of codepoints 10, 12 and 13: they tokenize a string, exactly like the space or tab character. They do not indicate anything about the structure of physical lines of a text.

Vowel quantity

Each of the five vowels in the Attic alphabet A, E, I, O, U could stand for a short or long quantity. The vowels E and O could in addition stand for diphthongs resulting from assimilation to a following vowel.

The following characters may be included in a string of Attic Greek to mark vowel quantity.

Character Code point (decimal) Valid character?
Underscore95true
Caret94true

A string of Attic Greek in either ASCII or Greek Unicode ranges may include an underscore character following a vowel to mark it explicitly as having a long value. In the case of E or O, the underscore marks a vowel that is either long by nature or a diphthong. A string of Attic Greek in either ASCII or Greek Unicode ranges may include a caret following a vowel to mark it explicitly as having a short value.

Examples of quantity markers

The string BO_LE_I includes explicit indications that the O and E vowels have long values. It is a valid Attic string in the ASCII mapping.

The string βο_λε_ι includes explicit indications that the O and E vowels have long values. It is a valid Attic string in the Greek range of Unicode.

The string HO_TO^S explicitly marks the first 0 as long, and the second O as short (e.g., as the masculine nominative singular pronoun written in the Ionic literary alphabet οὗτος). It is a valid Attic string in the ASCII mapping.

The string HO_TO_S explicitly marks both O characters as long (e.g., as the adverb written in the Ionic literary alphabet οὕτως). It is a valid Attic string in the ASCII mapping.

Accents

The mapping of Attic strings for the ASCII range and the Greek range of Unicode have distinct conventions for editorial addition of accents.

ASCII mapping

The three accents, circumflex, acute, and grave, are mapped to ASCII characters as follows:

Character Character Code point (decimal) Valid character?
Circumflex=61true
Acute/47true
Grave\92true

The accent character must follow either the vowel character it accents, or the vowel quantity character identifying the quantity of the preceding vowel character.

Examples of accents in the ASCII mapping

The string BOLE/ includes oxytone accent. It is a valid Attic string in the ASCII mapping.

The string BOLE=S includes perispomenon accent. It is a valid Attic string in the ASCII mapping.

The string BO_LE_/ includes oxytone accent as well as explicit indications that the O and E vowels have long values. It is a valid Attic string in the ASCII mapping.

The string BO_LE_=S includes perispomenon accent as well as explicit indications that the O and E vowels have long values. It is a valid Attic string in the ASCII mapping.

Unicode mapping

Since all strings are represented in Unicode form NFKC, combinations of accents and vowels are represented by the precomposed form, where one exists, for the combination of one of the specified lower case vowels with the combining accent character (combining acute codepoint 301, combining acute accent codepoint 300, or combining perispomenon codepoint x342), possibly with a combining form of either the smooth breathing (codepoint x313, combining turned comma above) or rough breathing (codepoint x314, combining comma above) characters. Since there is no precombined form of epsilon or omicron with circumflex in the Ionic alphabet, when the Attic vowel E or O is mapped to epsilon or omicron, circumflex accents are indicated by following the vowel with codepoint x342, the combining perispomenon.

Examples of accents in the Greek range of Unicode

Summary of valid characters

Attic strings are composed exclusively of alphabetic and punctuation characters, as specified here, tokenizing characters, characters indicating vowel quantity, and characters indicating accent, as specified above. This table summarizes all valid characters used in the ASCII mapping.

For a full list of the form NFKC Unicode characters used to map Attic strings to the Greek range of Unicode, please see, in addition to the specified alphabetic and punctuations, and the tables above, the Unicode consortium's normalization chart for Greek: http://www.unicode.org/charts/normalization/.