Working with Corpora: Schedule
Schedule subject to change.
All assignments are due at the end of the day that is the due date.
Aug 28: Introduction to working with corpora and programming in Python
Sep 2: Using programming to explore corpora -- a lightning tour. We use chapter 1 of the NLTK book.
Sep 4: Python basics.
Sep 9: What corpora are available? Some examples, and uses
Readings: NLTK chapter 2
Sep 11: Strings and functions in Python
Sep 16: Python lists and loops
Sep 18: Python lists and loops continued. Also, Python files
Readings: Think Python chapter 10 (files)
Assignment 1 due
Sep 23: Text formats and text encoding
Sep 25:Python list comprehensions
Here are some more List and loop problems
Sep 30:Making corpora
Oct 2:Annotating corpora: formats
Oct 7: Annotating corpora: quality control
Assignment 2 due
Oct 9: Sample Python programs
Sample Python programs here
We also talk in class about your project ideas.
Oct 14: Guest lecture
Oct 16: Guest lecture
Oct 21: Python dictionaries
Readings: Think Python chapter 11 (dictionaries), NLTK chapter 1 sec. 1.3
Project proposal due. Information on format and contents are available on the Working with Corpora: Assignments page.
Oct 23: Regular expressions
Oct 28: Regular expressions continued
Assignment 3 due
Oct 30: Searching annotated corpora
Slides: searching annotated corpora
We make use of the Opus Europarl interface at http://opus.lingfil.uu.se/cwb/Europarl/frames-cqp.html
We use tregex, available at http://nlp.stanford.edu/software/tregex.shtml#Download
Nov 4: Searching annotated corpora, continued.
Nov 6: Automatic analysis of text with the Natural Language Toolkit
Nov 11: Sample programs for text analysis.
Project presentation: Rebecca Kurlak
Project progress report due
Nov 13: A glimpse at the R statistics package.
Nov 18: A very short introduction to some concepts from statistics.
Assignment 4 due
Extra questions about the Inaugural data:
What was the longest speech in the 19th century? (Give the year, the president's name, and the length of the speech.) What was the longest speech in the 20th century?
Who used "I" the most? (That is, what is the name of the president who used "I" the most.) Who used "I" and "me" combined the most?
How many speeches used the term "freedom" at least 5 times?
How many speeches that used the term "freedom" at least 5 times were later than 1980?
How long is the longest name of a president? (nchar() gives you the length of a string, and remember to use as.character() to convert from factor to string.)
DIFFICULT: How many presidents served twice? (For this you need to use the code from the worksheet that adds in the name of the previous president.)
Nov 20: A very short introduction to some concepts from statistics, continued
Nov 25: A very short introduction to some concepts from statistics, continued.
Nov 27: Thanksgiving
Dec 2: Project presentations
12:30 Hoyoung Yi
12:42 Adam McBride
12:55 Rozen Neupane
13:07 Maxim Baryshevtsev
13:20 Whitman Suarez
13:32 Liang Sun
Dec 4: Project presentations
12:30 Viola Green
12:42 Ethan Cooper
12:55 Margo Blevins
13:07 Magdalena Saldana
13:20 Saif Shahin
13:32 Shannon McGregor
13:45 Patrick Schultz
Final report due: Monday December 8, end of the day