Courses‎ > ‎Working with Corpora‎ > ‎

Working with Corpora: Schedule

Schedule subject to change.

All assignments are due at the end of the day that is the due date.

Week 1

Aug 28: Introduction to working with corpora and programming in Python

Week 2

Sep 2: Using programming to explore corpora -- a lightning tour. We use chapter 1 of the NLTK book.

Sep 4: Python basics.

Week 3

Sep 9: What corpora are available? Some examples, and uses

Sep 11: Strings and functions in Python

Week 4

Sep 16: Python lists and loops

Sep 18: Python lists and loops continued. Also, Python files

Week 5

Sep 23: Text formats and text encoding

Sep 25:Python list comprehensions

Week 6

Oct 7: Annotating corpora: quality control

Oct 9: Sample Python programs

  • Sample Python programs here
  • We also talk in class about your project ideas.

Week 8

Oct 14: Guest lecture

Oct 16: Guest lecture

Week 9

Oct 21: Python dictionaries

Oct 23: Regular expressions

Week 10

Oct 28: Regular expressions continued

  • Assignment 3 due

Oct 30: Searching annotated corpora

Week 11

Nov 4: Searching annotated corpora, continued.

Nov 6: Automatic analysis of text with the Natural Language Toolkit

Week 12

Nov 11: Sample programs for text analysis.

Nov 13: A glimpse at the R statistics package.

Week 13

Nov 18: A very short introduction to some concepts from statistics.

  • Harald Baayen's Analyzing Linguistic Data
  • Assignment 4 due
  • Additional material we may use: Data exploration, Fisher telephone conversation corpus: table 1, table 2
  • Extra questions about the Inaugural data:
    • What was the longest speech in the 19th century? (Give the year, the president's name, and the length of the speech.) What was the longest speech in the 20th century?
    •  Who used "I" the most? (That is, what is the name of the president who used "I" the most.) Who used "I" and "me" combined the most?
    •  How many speeches used the term "freedom" at least 5 times?
    •  How many speeches that used the term "freedom" at least 5 times were later than 1980?
    •  How long is the longest name of a president? (nchar() gives you the length of a string, and remember to use as.character() to convert from factor to string.)
    •  DIFFICULT: How many presidents served twice? (For this you need to use the code from the worksheet that adds in the name of the previous president.)
  • Worksheet

Nov 20: A very short introduction to some concepts from statistics, continued

Week 14

Nov 25: A very short introduction to some concepts from statistics, continued.

Nov 27: Thanksgiving

Week 15

Dec 2: Project presentations 

  • 12:30 Hoyoung Yi
  • 12:42 Adam McBride
  • 12:55 Rozen Neupane
  • 13:07 Maxim Baryshevtsev
  • 13:20 Whitman Suarez
  • 13:32 Liang Sun

Dec 4: Project presentations

  • 12:30 Viola Green
  • 12:42 Ethan Cooper
  • 12:55 Margo Blevins
  • 13:07 Magdalena Saldana
  • 13:20 Saif Shahin
  • 13:32 Shannon McGregor
  • 13:45 Patrick Schultz

Final report due: Monday December 8, end of the day

Subpages (1): NLTK demo code