Schedule: LIN353C introduction to computational linguistics
This class meets in person 4 times this semester: August 27, September 24, October 22, and November 19. In-person meeting days are marked in yellow in the schedule below. The in-person sessions will be used for a discussion of larger issues in computational linguistics, and for a discussion of your experiences with the hands-on exercises. All other class sessions will be offered online synchronously, with in-class interactions via zoom. The online sessions will also be recorded, with recordings available on Canvas for two weeks after the class session. It is your choice whether to attend the in-person meetings in person or via zoom; remote participants in the discussion sessions will be accommodated.
This class includes an introduction to programming in Python. If you already know how to program in Python, it is fine not to attend these class sessions. The programming introduction will be in dedicated sessions that cover only programming. Introduction to programming sessions are marked in blue in the schedule below.
Assignments are due right before class (12:30pm) on their due date unless noted otherwise. Assignment due dates are marked in orange in the schedule.
Unless otherwise noted, all readings can be done after class time.
This schedule is subject to change.
Aug 27: In-person meeting: Introduction and welcome.
Introductory slides are on Canvas
We discuss the syllabus
Try out a language technology application. Options:
Sep 1: Introduction to programming: first steps
Due: Food for Thought 1.
We'll be using Jupyter Notebooks in class. Please download notebooks to your computer. Then either open them in Anaconda with notebooks, or open a terminal, go to the directory with the notebooks, and run the command jupyter notebook
Sep 3: Text normalization and regular expressions
Sep 8: Introduction to programming: Conditions and lists
Sep 10: Regular expressions and text normalization, continued. Probabilities and language modeling.
We will use regular expressions on the opus online corpus collection.
Sep 15: Introduction to programming: Loops
Due: Assignment 1
Sep 17: Language modeling.
Readings: Jurafsky and Martin (2nd edition) chapter 4: n-grams, sections 4.1-4.3. You can find a shorter version, without the discussion of counting words, in Jurafsky and Martin 3rd edition, chapter 3 section 3.1
Sep 22: Introduction to programming: Dictionaries.
Sep 24: In person meeting: Discussing language models.
Due: Food for Thought 2
Sep 29: Introduction to programming: Making your own functions.
Due: Assignment 2
Oct 1: Classification using Naive Bayes; Sentiment analysis.
Oct 6: Naive Bayes, continued.
Due: Assignment 3.
Oct 8: Classification using logistic regression.
Oct 13: Word embeddings: Characterizations of word usage
Due: Assignment 4
Oct 15: Neural word embeddings, and neural language models.
Jurafsky and Martin 3rd edition, chapter 7. This is again very math-heavy .Read section 7.1, and skim 7.3 and 7.4 through 7.4.2,
Oct 20: Part-of-speech tagging.
Due: Assigment 5.
Readings: Jurafsky and Martin 2nd edition chapter 5 through 5.4
Oct 22: In-person meeting: Discussing neural models, and word embeddings
Due: Food for Thought 3.
Here is a word2vec demo that can visualize embeddings in 3-D space. To use it, enter a word on the line on the right that says "Search", and hit the "Isolate selection" button.
Oct 27: Part-of-speech tagging continued
Readings: Jurafsky and Martin 2nd edition chapter 5 section 5.5
Oct 29: Describing syntactic structure with phrase-structure grammar
Readings: Jurafsky and Martin 2nd edition chapter 12 thorugh 12.3.5, 12.4.1, and 12.7.1
We will also use chapter 7 of the NLTK book, in particular section 7.2
Nov 3: Parsing
Due: Assignment 6
Readings: Jurafsky and Martin 2nd edition chapter 13 thorugh 13.2
Nov 5: Parsing continued
Readings: Jurafsky and Martin 2nd edition chapter 13, sections 13.3 and 13.4.1
Nov 10: Describing sentence meaning: logic-based representations and semantic roles
Semantic role resources:
Large-scale representation of natural language meaning with logic: The Groningen MeaningBank, which has a tool for inspecting sentence representations
Nov 12: Semantic role labeling. Also: Recurrent models for part-of-speech tagging and for semantic role labeling
.Readings: Jurafsky and Marting 3rd edition chapter 9, feel free to skim the math, up to and including 9.2
Nov 17: Encoder-decoder models
Due: Assignment 7
Nov 19: In-person meeting: Discussing sentence structure, and the Chomsky hierarchy
Due: Food for Thought 4
Nov 24: Machine translation
Nov 26: Thanksgiving break
Dec 1: Information extraction
Dec 3: Coreference resolution
Due: Assignment 8