Goals

Shared goals with both sections:

  • work collaboratively
  • cultivate habits of test-driven thinking:
    • hypotheses are wrong until proven otherwise
    • to know if a test works, you must first fail it. That’s why the first word of our embedded theme is “fail.”
    • iteratively improve
  • develop a reproducible research project from an initial question to implementation, including
    • explicit license for reuse
    • source material and analytical methods identified
  • oral presentation
  • written presentation: one source, multiple formats

Specific objectives in CLAS 199-S05

  • read texts from URLs or local files either as a single string or as a series of sections (chapters in a book, lines in a poem)
  • tokenize a text and construct frequency histograms
  • plot histograms and analyze frequencies in relation to Zipf’s Law
  • extract features like named entities from a corpus
  • identify significant terms with metrics like TF-IDF
  • select an appropriate data model for features in Julia, and organize feature data as graphs, matrices, and dictionaries

Classics 199, Papyrus to Pixels. All material on this web site is available under the Creative Commons Attribution Share-Alike license CC BY-SA 4.0 on github.