API documentation
Structures
CitableParserBuilder.AbbreviatedUrn
— TypeShort form of a Cite2Urn containing only collection and object ID.
CitableParserBuilder.Stem
— TypeSupertype of all concrete Stem structures.
CitableParserBuilder.Rule
— TypeSupertype of all concrete Rule structures.
CitableParserBuilder.Analysis
— TypeCitable analysis of a string value.
An Analysis
has five members: a token string value, and four abbreviated URNs, one each for the lexeme, form, rule and stem.
CitableParserBuilder.StemUrn
— TypeAbbreviated URN for a morphological stem.
CitableParserBuilder.RuleUrn
— TypeAbbreviated URN for rule.
CitableParserBuilder.LexemeUrn
— TypeAbbreviated URN for a lexeme.
CitableParserBuilder.FormUrn
— TypeAbbreviated URN for a morphological form.
CitableParserBuilder.AnalyzedToken
— TypeMorphological analyses for a token identified by CTS URN.
Parsing
CitableParserBuilder.parsetoken
— FunctionDelegate to specific functions based on type's citable trait value.
parsetoken(s, x; data)
It is an error to invoke the parsetoken
using types that are not a parser.
parsetoken(, s, x; data)
Citable parsers must implement parsetoken.
parsetoken(, s, x; data)
Parse String s
by looking it up in a given dictionary.
CitableParserBuilder.parsepassage
— FunctionParse a CitablePassage
with text for a single token with a CitableParser
.
parsepassage(cn, p; data)
Returns a single AnalyzedToken
.
Parse a CitablePassage
with text for a single token with a CitableParser
.
parsepassage(ct, p; data)
Returns a single AnalyzedToken
.
CitableParserBuilder.parsecorpus
— FunctionUse a CitableParser
to parse a CitableTextCorpus
with each citable node containing containg a single token of type LexicalToken
.
parsecorpus(c, p; data, countinterval)
Returns anAnalyzedTokens
object.
Working with vectors of AnalyzedToken
s
CitableParserBuilder.lexemes
— FunctionExtract a list of lexemes from a Vector of Analysis
objects.
lexemes(v)
Extract a list of lexemes from a Vector of AnalyzedToken
objects.
lexemes(v)
Extract a list of lexemes from an AnalyzedTokens
object.
lexemes(atokens)
CitableParserBuilder.stringsforlexeme
— FunctionFind token string values for all tokens in a vector of AnalyzedToken
s parsed to a given lexeme.
stringsforlexeme(v, lexstr)
CitableParserBuilder.lexemedictionary
— FunctionFrom a vector of AnalyzedToken
s and an index of tokens in a corpus, construct a dictionary keyed by lexemes, mapping to a further dictionary of surface forms to passages.
lexemedictionary(parses, tokenindex)
Working with AbbreviatedUrn
s
CitableParserBuilder.abbreviate
— FunctionConstructs an AbbreviatedUrn
string from a Cite2Urn
.
abbreviate(urn)
Example:
julia> abbreviate(Cite2Urn("urn:cite2:kanones:lsj.v1:n123"))
"lsj.n123"
Example: a pipeline abbreviating a Cite2Urn
and forming a LexemeUrn
from the abbreviated string value.
julia> Cite2Urn("urn:cite2:kanones:lsj.v1:n123") |> abbreviate |> LexemeUrn
LexemeUrn("lsj", "n123")
CitableParserBuilder.expand
— FunctionConstructs a Cite2Urn
from an AbbreviatedUrn
and a dictionary mapping collection identifiers in AbbreviatedUrns's to full Cite2Urn
s for a versioned collection.
CitableParserBuilder.fstsafe
— FunctionCompose SFST representation of an AbbreviatedUrn
.
fstsafe(au)
Example:
julia> LexemeUrn("lexicon.lex123") |> fstsafe
"<u>lexicon\.lex123</u>"
Working with Stem
s and Rule
s
CitableParserBuilder.lexeme
— FunctionFunction required to get lexeme value of a Stem implementation.
CitableParserBuilder.id
— FunctionFunction required to get ID value of a Stem implementation.
Function required to get ID value of a Rule implementation.
CitableParserBuilder.inflectiontype
— FunctionFunction required to get string value for inflection class of a Stem implementation.
Function required to get string value for inflection class of a Rule implementation.
Serialization
CitableParserBuilder.readfst
— FunctionRead SFST output from file f
, and parse into a dictionary keying tokens to a (possibly empty) array of SFST strings.
readfst(f)
CitableParserBuilder.relationsblock
— FunctionCompose a CEX relationset
block for a set of analyses.
relationsblock(urn, label, v)
relationsblock(urn, label, v, delim; registry)
CitableParserBuilder.delimited
— FunctionSerialize an Analysis
to delimited text. Abbreviated URNs are expanded to full CITE2 URNs using registry
as the expansion dictionary.
delimited(a; delim, registry)
Serialize a Vector of Analysis
objects as delimited text.
delimited(v; delim, registry)
Serialize a single AnalyzedToken
as one or more lines of delimited text.
delimited(at; delim, registry)
Serialize a Vector of AnalyzedToken
s as delimited text.
delimited(v; delim, registry)
Serialize an AnalyzedTokens
object as delimited text (required for Citable
interface).
delimited(atcollection; delim, registry)
Uses abbreviated URNs. These can be expanded to full CITE2 URNs when read back with a URN registry, or the delimited
function can be used with a URN registry to write full CITE2 URNs.
Missing docstring for cex
. Check Documenter's build log for details.