# LIN 392 Analyzing Linguistic Data: Schedule

This schedule is subject to change.

Assignments are due at the end of their due date (midnight).

Readings can be done either before or after class; they are chosen to support the material covered in class.

Readings: "ALD" is Baayen's "Analyzing Linguistic Data," "SE" is Hinton's "Statistics Explained."

### Week 1, Jan 16

*Class canceled because of inclement weather.*

### Week 2, Jan 23

Introduction

LL breakfast experiments: passive voice, Analyzing real estate listings, and more of the same

Please install the R statistics package on your laptop before this lecture. I recommend RStudio. Please use the free version, not the commercial license.

Readings: ALD ch. 1

### Week 3, Jan 30

Descriptive statistics: central tendency and spread. Data visualization

Readings: ALD ch. 2

Readings: SE chapter 2

R worksheets: merge and aggregate

Data for exploration: meaning annotation (zipped archive), Fisher word counts

### Week 4, February 6

Descriptive statistics: We look again at the worksheets:

Probability distributions, samples and population

*s**See documents on Canvas: Sampling, and the central limits theorem*Statistical tests: the basic principle; significance thresholds

Readings: SE ch. 3, 4,5 ALD ch. 3 pp. 44-63

**Homework 1 due Thursday February 8 end of day**

### Week 5, February 13

The t-test

R code: reading, preprocessing and counting text.

Readings: SE ch. 6 - 9

**Homework 2 due on Thursday February 15 end of day**

### Week 6, February 20

Statistical testing: t-test exercises, chi-squared. Comparing more than two data sets: ANOVA

Readings: SE ch. 10-12, 19

Data: Fisher word counts, Fisher telephone corpus, analyzed using LIWC (warning: big dataset!), meta-data for the Fisher telephone corpus

Pitfalls of statistical analysis: The problem of multiple testing and the famous Green Jelly Bean story: https://xkcd.com/882/

**Discussion in class about your project ideas****Homework 3 due on Thursday February 22 end of day**

### Week 7, February 27

Correlation and linear regression

Readings: SE ch. 20

**Project proposal due on Thursday Mar 1 end of day**

### Week 8, March 6

*Guest lectures:**2pm: Scott Myers**3:30: Danny Law*

### Week 9, March 13: Spring Break

### Week 10, March 20

Linear regression in R; multiple regression, and different types of predictors

R code: more linear regression

Readings: ALD ch. 4 pp. 84 - 101; SE ch. 21

### Week 11, March 27

Logistic regression

Readings: ALD ch 6 pp 195-199, 202-203

**Thursday March 29: Homework 4 due**

### Week 12, April 3:

Logistic regression and model comparison

Readings: ALD ch. 6 pp 174-188

**Progress report due**

### Week 13, April 10

Mixed-effects models in R; practicing regression

Readings: ALD ch. 7 pp 241-284

Model criticism: See this tutorial on linear regression

**Thursday April 12: Homework 5 due**

### Week 14, April 17

More mixed-effects models in R

Model criticism: R worksheet: influential datapoints

See the mixed effects worksheet available on Canvas.

### Week 15, April 24

Clustering for data exploration

Readings: ALD ch. 5 up to p. 148

For the topic modeling demo, we use two datasets available on Canvas: r8-train-data and r8-train-meta

### Week 16, May 1

**Project presentations***:*2pm: Laura Faircloth

2:11: Javier Jasso

2:22 Tracy Adams

2:33 Brendon Kaufman

2:44 Frances Cooley

2:55 Katherine Winters

3:06: Siddharth Kumar

3:17: Lorena Orjuela

3:28: Sandy Keerstock

3:39: Gladys Camacho Rios

3:50: Kristen Meemann

4:01: Marylise Rilliard

4:12: Rachel Tessmer

**Final report due: May 10**