Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Orthography

An orthography is defined by the following functional requirements. It is possible to:

  • enumerate its complete character set
  • evaluate if a sequence of characters is orthographically valid
  • enumerate a set of token types
  • parse a stream of valid characters into a sequence of classified tokens, associating a substring of the character stream and a token type

This implies that the orthography can also parse a citable text citable at the level of the token (i.e., extending the canonical citation hierarchy one level) into a series of classified tokens, associating a token type with each citable token. s This definition is generic enough to appy to many languages (or perhaps any language?).