LIN 392 Working with Corpora: Schedule

Schedule as of “Last modified” date at bottom. Subject to change.

Assignments are due by class time (12pm) on their due date. Project proposal, progress report, and final report are due midnight (12am) at the end of the day that is the due date.

Week 1

Aug 25: Introduction to working with corpora and to programming in Python

  • Readings: HTTLCS chapter 1 and 2 (Read this after class.)
  • Worksheet: Intro to Python and Unix: course archive, intro_python.pdf

Week 2

Sep 1: Python functions, conditionals, loops, and lists

  • We will finish up the previous week's worksheet, discussing Python functions. Then we move on to the new Worksheet: Python conditionals, loops, and lists: course archive, python_lists_conditionals.pdf
  • Readings: HTTLCS chapter 3, 4, 5, 6, 9
  • If you are done with the worksheet ahead of time: The chapters in HTTLCS above contain exercises at the end, which are highly recommended!

Week 3

Sep 8: Making corpora

Week 4

Sep 15: Annotating corpora

Week 5

Sep 22: Working with Python strings and files

Week 6

Sep 29: Python data structures

Week 7

Oct 6: Sample Python programs

  • We also talk about your project ideas
  • Assignment 2 due
  • Worksheet: sorting lists in Python. course archive, python_sorting.ppt
  • Two sample Python programs. course archive, python_sample_programs.zip

Week 8

Oct 13: An introduction to the Natural Language Toolkit

Week 9

Oct 20: Regular expressions

Week 10

Oct 27: Searching annotated corpora

Week 11

Nov 3: Automatic analysis of text with the Natural Language Toolkit

Week 12

Nov 10: Finishing up the use of NLTK for using and writing language processing tools. Then: Statistical analysis of linguistic data: a very short introduction.

  • In discussing probability theory, we used a program for estimating conditional probabilities of words given preceding words: course archive, conditionalprobs.zip
  • Project progress report due

Week 13

Nov 17: Finishing up the short introduction to statistical analyses of linguistic data. Then: Analyzing linguistic data: a short glimpse at the R statistics package

  • Worksheet: a short introduction to R for people who know Python. course archive, r_and_python.pdf
  • Assignment 4 due

Week 14

Nov 24:

  • Doing some statistics with R: Worksheet, course archive, r_stats.pdf
  • Processing XML with Python: Slides and sample file. course archive, python_xml.pdf, crocodile.zip

Week 15

Dec 1: Project presentations

  • 12:00 Ding
  • 12:12 Parry
  • 12:24 Kim
  • 12:36 Ganeshan
  • 12:48 Schultz
  • 13:00 Lee
  • 13:12 Blanco
  • 13:24 break
  • 13:36 Wendorf
  • 13:48 Zaheed
  • 14:00 Cope
  • 14:12 Richter
  • 14:24 Bohmann
  • Project final reports are due on Dec 6, 2010
Katrin Erk,
Jan 6, 2012, 9:58 AM