Analyzing linguistic data, and programming for linguists

Spring 2021 | Instructor: Katrin Erk| TTH 11-12:30 | Hybrid: zoom, and UTC 4.110 | Canvas

Today, huge amounts of text are available in electronic form. We can poke these electronic text collections to answer questions about language -- and questions about the people who use it. For example, we can test whether passive constructions are increasingly falling out of favor in English, and we can trace how words change their meaning over time. We can also study a politician's word choices in political debates to find out more about their personality, or we can see how inaugural addresses have changed over time.

This course provides a hands-on introduction to working with text data. This includes an introduction to programming in Python, with a focus on text processing and data exploration, with a "cookbook" of programming examples that will enable you very quickly to analyze texts on your own. Most of the conclusions that we want to draw from text are "risky conclusions", they are trends rather than yes-or-no answers, so the course also includes an introduction to statistical techniques for data exploration and for making and assessing "risky conclusions". The course also includes a course project where you can test your text analysis skills on a question of your own choice.

This course carries an Independent Inquiry flag as well as a Quantitative Reasoning flag.

In Spring 2021, this course is hybrid: see the FAQ!

Prerequisites: Upper-division standing.


P. R. Hinton (2004): Statistics Explained: A Guide for Social Science Students. Psychology Press; 3rd edition, 2014