Datasets

For the four collection Kanones needs:

Forms collection

  • a collection of all possible morphological forms is precompiled as part of the GH repo. These are the only forms Kanones can work with: you cannot change these.

Collection of lexemes

  • a large collection of Greek lexemes is included in two collections: lsjx is a collection of candidate lexemes generated from LSJ articles; lsj is a subset of those that have been verified to be a lexeme.

Inflectional rules

  • for standard literary Greek, datasets/literarygreek-rules should be all you need for Attic. You can add to these if you want to e.g. expand coverage of literary dialects. Best practice: maintain additions in separate files, and please submit pull request to add them to Kanones' gh repo!
  • for Attic alphabet pre 403 BCE, sample rules in datasets/attic.

Stems

In practice, this is the dataset you're most likely to edit.

Identifying lexemes:

  • check LSJ from folio2.furman.edu; use its ID value if you find your item. Otherwise, register your own namespace, create a new id in that namespace
  • use separate files to group things easily. Eg., proper names in a particular text or corpus that do not appear in LSJ