Goals
Shared goals with both sections:
- work collaboratively
- cultivate habits of test-driven thinking:
- hypotheses are wrong until proven otherwise
- to know if a test works, you must first fail it. That’s why the first word of our embedded theme is “fail.”
- iteratively improve
- develop a reproducible research project from an initial question to implementation, including
- explicit license for reuse
- source material and analytical methods identified
- oral presentation
- written presentation: one source, multiple formats
Specific objectives in CLAS 199-S05
- read texts from URLs or local files either as a single string or as a series of sections (chapters in a book, lines in a poem)
- tokenize a text and construct frequency histograms
- plot histograms and analyze frequencies in relation to Zipf’s Law
- extract features like named entities from a corpus
- identify significant terms with metrics like TF-IDF
- select an appropriate data model for features in Julia, and organize feature data as graphs, matrices, and dictionaries