Link Search Menu Expand Document

Creating a VerbFrom from SFST output

The SFST output for analysis of a conjugated verb form looks like this (where each entry in the Vector is one line of output):

val fstLines = Vector(
  "> fecit",
  "<u>ocremorph.n17516b</u><u>ls.n17516</u><#>fec<verb><pftact><div><pftact><verb>it<3rd><sg><pft><indic><act><u>livymorph.pftact_pft3</u>"
)

The FstReader object can parse a Vector of SFST output lines, and create a (possibly empty) Vector of AnalyzedTokens. We should get only one token from the FST above:

import edu.holycross.shot.tabulae._
val analyzedTokens = FstReader.parseFstLines(fstLines)
assert(analyzedTokens.size == 1)

Each AnalyzedToken pairs the token String with a Vector of things extending the LemmatizedForm trait.

val analysis = analyzedTokens(0)
analysis.literalToken
analysis.analyses

Use Scala pattern matching to get a specific type of LemmatizedForm:

val verbForm: VerbForm = analysis.analyses(0) match {
  case vb: VerbForm => vb
  case _ => throw new Exception("Nope, not a verb")
}

You can then work with the specific functions appropriate to that type of form.

assert(analysis.literalToken == "fecit")
assert (verbForm.person == Third)
assert (verbForm.grammaticalNumber == Singular)
assert (verbForm.tense == Perfect)
assert (verbForm.mood == Indicative)
assert (verbForm.voice == Active)

Note that this example is for conjugated verb forms. Participles, gerunds, gerundives and infinitives have their own analytical patterns.