Corpus-specific datasets are organized in subdirectories of a root directory specified by the datasets setting in your configuration file. The kanónes build system includes a task corpus with the syntax:
sbt fst CORPUSNAME
Within the datasets directory, this creates a subdirectory named CORPUSNAME with the full directory layout that kanónes expects when building a parser from your data.
Tabular files defining inflectional rules are in subdirectories of of the rules-tables directory. Tabular files defining lexical items (“stems”) are in subdirectories of the stems-tables directory. In addition to these tables (which may be unique for each corpus to analyze), the orthography directory must include a file named alphabet.fst. (Typically, many corpora might use an identical alphabet.) This is a very simple file in the syntax of the Stuttgart Finite State Tooklkit (SFST).
TBA