FST logic
The final parsing transducer is actually created by a chain of smaller transducers. In broad outline, they do the following:
- combine all lexical entries (“stems”) with all inflectional rules (that is, generate the cross product of stems and rules)
- pass to a transducer that accepts only pairings of stems and rules belonging to the same stemtype (defined in
symbols/stemtypes.fst
)
- categorize all symbols as either “surface” symbols, or “analytical” symbols, and allow only one category to pass through.
The transducers that do this work are organized in the following directories:
The definition of the corpus-specific set of symbols used by the FST is described here.