Abbreviated URN values

The AbbreviatedUrn is an abstract type supporting an abbreviated notation for Cite2Urns. It allows you to work with objects uniquely identified by collection identifier and object identifier, when the collection is registered in a dictionary that can expand the collection identifier to a full Cite2Urn.

The modules implements the AbbrevatedUrn for each uniquely identified component of an Analysis:

  1. LexemeUrn
  2. FormUrn
  3. StemUrn
  4. RuleUrn

An AbbreviatedUrn has a collection identifier, and an object identifier. You can construct an AbbreviatedUrn from a dot-delimited string.

using CitableParserBuilder
lexurn = LexemeUrn("lsj.n125")
lexurn.collection

# output

"lsj"
lexurn.objectid

# output

"n125"

Abbreviated URNs and Cite2Urns

You can use the abbreviate function to create an abbreviation string from a Cite2Urn using the collection identifier and the object identifer of the Cite2Urn.

using CitableParserBuilder, CitableObject
conjunctionurn = Cite2Urn("urn:cite2:kanones:morphforms.v1:1000000001")
abbreviate(conjunctionurn)

# output

"morphforms.1000000001"

Of course you can use this string in turn to instantiate an AbbreviatedUrn structure.

formurn = abbreviate(conjunctionurn) |> FormUrn
typeof(formurn)

# output

FormUrn
formurn.objectid

# output

"1000000001"

To convert an AbbreviatedUrn to a full Cite2Urn, give the expand function a dictionary mapping collection identifiers to full URN strings for the collection

registry = Dict(
    "morphforms" => "urn:cite2:kanones:morphforms.v1:"
)
expanded = expand(formurn, registry)
typeof(expanded)

# output

Cite2Urn
expanded.urn

# output

"urn:cite2:kanones:morphforms.v1:1000000001"

Abbreviated URNs and SFST-PL

The fstsafe function composes an expression in SFST-PL for AbbrevatiedUrns. It assumes that your SFST alphabet includes tokens <u> and </u> to mark beginning and ending boundaries of URN values. It escapes characters that are valid in URNs but reserved in the Stuttgart FST toolkit.

rule = RuleUrn("nouninfl.h_hs1")
fst = fstsafe(rule)

# output

"<u>nouninfl\\.h\\_hs1</u>"