Parsed output
The FstReader object recognizes the SFST patterns for a token and for a sequence of analyses, and from pairings of tokens and analyses creates AnalyticalTokens.
An AnalyticalToken in turn associates a literal token String with a (possibly empty) Vector of LemmatizedForms. The LemmatizedForm is a trait associating the literal token with a single morphological analysis from a single lexeme. The LemmatizedForm trait requires ID strings for:
- the lexeme
- the applied rule
- the applied stem
- a “part of speech” label
The apply function of the LemmatizedForm object considers a single line of SFST output and makes an Option[LemmatizedForm]. (Its value is None if the LemmatizedForm object cannot parse the string).
The LemmatizedForm trait is implemented by classes for possible analytical pattern, namely:
VerbForm(conjugated verb form): analytical pattern PNTMV. See an example.ParticipleForm: analytical pattern GCNTVInfinitiveForm: analytical pattern TVGerundiveForm: analytical pattern GCNGerundForm: analytical pattern CSupineForm: analytical pattern CAdverbForm: analytical pattern DNounForm: analytical pattern GCNAdjectiveForm: analytical pattern GCNDIndeclinableForm: analytical pattern Pos