DNA and texts

Background: modelling features

DNA sequences of biological organisms can be thought of as analogous to the text of a book, like an instruction manual. A gene in our genome is like the instruction manual to build a table, the DNA nucleotide base sequence is like the words in the steps for building the table, the amino acids are like the wood and nails, and the protein is like the completed table. Each of these parts can be considered FEATURES that we can investigate, model, and compare among organisms. We can use DNA and protein features to classify organisms, and reconstruct their evolutionary history.

After learning a little bit about DNA and proteins, we will use the same kinds of data modeling and analysis techniques used for texts like Lincoln’s Gettysburg address to explore questions about DNA and protein amino acids of the Cytochrome Oxidase I gene in beetles and humans, two very different organisms.

Class preparation: DNA

Please read this brief article, “Mutations Are the Raw Materials of Evolution,” before class.

Be sure you are familiar with the main concepts we’ll work with in our lab assignment: a sequence of nucleotide bases; groups of three bases (a codon) can be analyzed as amino acids; a longer sequence can comprise an entire gene.

Prof. Ober’s slides for today’s class are available from your course Google drive in a slides folder, named BIOCLAS199DNAinfo.pptx: you may use them ahead of time, and for review after class.

Class preparation: Julia dictionaries

In our next lab, we’ll continue to use vectors for collections of data, but will also introduce a new struture in Julia, the dictionary. Read through this Pluto notebook before class. It’s saved as a web page, just like the template notebook for your lab assignment, so feel free to try it out in Pluto, or to just use a Pluto REPL, and copy and paste some code there to try out dictionaries.


Classics 199, Papyrus to Pixels. All material on this web site is available under the Creative Commons Attribution Share-Alike license CC BY-SA 4.0 on github.