Today, huge amounts of text are available in electronic form. We
can poke these electronic text collections to answer questions about
language -- and questions about the people who use it. For example, we
can test whether passive constructions are increasingly falling out of
favor in English, and we can trace how words change their meaning over
time. We can also study a politician's word choices in political debates
to find out more about their personality, or we can see how inaugural
addresses have changed over time. This course
provides a hands-on introduction to working with text data. This
includes an introduction to programming in Python, with a focus on text
processing and data exploration, with a "cookbook" of programming
examples that will enable you very quickly to analyze texts on your own.
Most of the conclusions that we want to draw from text are "risky
conclusions", they are trends rather than yes-or-no answers, so the
course also includes an introduction to statistical techniques for data
exploration and for making and assessing "risky conclusions". The course
also includes a course project where you can test your text analysis
skills on a question of your own choice. This course carries an Independent Inquiry flag as well as a Quantitative Reasoning flag. In
Spring 2021, this course is hybrid: see the FAQ! Prerequisites: Upper-division standing. Textbook:P. R. Hinton (2004): Statistics Explained: A Guide for Social Science Students. Psychology Press; 3rd edition, 2014 |
Courses >