LSJMining: overview

Quarry morphological data from the Perseus edition of Liddell-Scott-Jones, Greek Lexicon.

Data in this repository

Source

The LSJMining package takes it XML data from Giuseppe Celano's conversion of beta-code Greek in the Perseus Liddell-Scott-Jones Greek Lexicon to Unicode Greek, available here.

Organization

files in the directory sourcedata contain TEI entryFree elements from the Perseus LSJ, formatted with one entry per line.
files in the cex directory named morphdata_*.cex are delimited-text files with morphological data extracted from Perseus LSJ using the package LexiconMining. The morphdata_*.cex files are created from hints in the TEI markup to identify seven items: a unique identifier, a tidied-up label, the lemma string in the lexicon, a part of speech, an inflectional type, a genitive form for nouns, and a mood for verbs. (The latter is not extracted from the LSJ markup, and is always an empty string in this repository.)

License

The data here are available under vesion 4.0 of the Creative Commons BY-NC-SA license.

Code

The goal of this package is to construct a morphological lexicon formatted for use with Kanones.

The kanones function morphological data from this repository, and writes tables of stems data formatted for Kanones in a target directory. It consults a specified clone of the Kanones.jl repository to filter out stems for any lexemes already defined in the core datasets included in Kanones.jl.

The kanonesdata/lsjx directory contains the current output of the kanones function. It is created by cloning the Kanones.jl repository in an adjacent directory, and executing the following lines:

using LSJMining

src = "cex"
target = joinpath(pwd(), "kanonesdata","lsjx")
kroot = joinpath("..","greek2021","Kanones.jl")

kanones(src, target, kroot)

Other things you can do

If you want to work directly with the extracted morphological data, you can read the cex directory's files into a vector of MorphData objects.

using LSJMining
cexdir = joinpath(repo, "cex")
morph = loadmorphdata(cexdir)

111711-element Vector{LexiconMining.MorphData}:
 LexiconMining.MorphData("n0", "Α", "Α", "", "indecl.", "τό", "")
 LexiconMining.MorphData("n1", "ἀ", "ἀ1", "", "α στερητικόν", "", "")
 LexiconMining.MorphData("n2", "ἀ", "ἀ2", "", "ϝ", "", "")
 LexiconMining.MorphData("n3", "ἆ", "ἆ3", "", "", "", "")
 LexiconMining.MorphData("n4", "ἃ", "ἃ", "", "", "", "")
 LexiconMining.MorphData("n6", "ἀάατος", "ἀάατος", "", "ον", "", "")
 LexiconMining.MorphData("n8", "ἀαγής", "ἀαγής", "", "ές", "", "")
 LexiconMining.MorphData("n10", "ἀάζω", "ἀάζω", "", "", "", "")
 LexiconMining.MorphData("n13", "ἀακίδωτος", "ἀακίδωτος", "", "ον", "", "")
 LexiconMining.MorphData("n15", "ἀανές", "ἀανές", "", "", "", "")
 ⋮
 LexiconMining.MorphData("n116497", "ὠχρός", "ὠχρός1", "", "ά", "", "")
 LexiconMining.MorphData("n116498", "ὦχρος", "ὦχρος2", "", "ου", "", "")
 LexiconMining.MorphData("n116499", "ὠχροσύνη", "ὠχροσύνη", "", "", "ἡ", "")
 LexiconMining.MorphData("n116500", "ὠχρότης", "ὠχρότης", "", "ητος", "ἡ", "")
 LexiconMining.MorphData("n116501", "ὤχρωμα", "ὤχρωμα", "", "ατος", "τό", "")
 LexiconMining.MorphData("n116502", "ὤψ", "ὤψ", "", "", "ἡ", "")
 LexiconMining.MorphData("n116503", "ὠψά", "ὠψά", "", "", "τά", "")
 LexiconMining.MorphData("n116504", "ὠώ", "ὠώ", "", "", "", "")
 LexiconMining.MorphData("n116505", "ᾠώδης", "ᾠώδης", "", "ες", "", "")

Other material in this repository

scripts: some Julia scripts for managing data in this repository