Zipf’s law
In preparation for our third assignment, we will begin to look at different models of language. How do identify “interesting” or “significant” patterns of language? Should we focus on vocabulary? (We’ll see how the “Bag of Words” model takes this approach.) Are predictable sequences of vocabulary items important? (We’ll look at n-gram models of text, and more generic vector models of texts.) What is the meaning of rare or exceptional patterns vs. frequent or “normal” patterns?
To prompt your thinking about language, we’ll look at a remarkable phenomenon known as “Zipf’s Law”.
Please watch this video introducing “The Zipf Mystery”.