LIN350 Computational semantics

Fall 2021 | Instructor: Katrin Erk| MWF 10-11 | RLP 1.108

Course overview

How can we describe the meaning of words and sentences in such a way that we can process them automatically? That seems a huge task. There are so many words, all with their individual nuances of meaning -- do we have to define them all by hand? And there are so many things we want to do with sentences: Translate them. Answer questions. Extract important pieces of information. Figure out people's opinions. Can we even use one single meaning description to do all these tasks?

In this course, we discuss methods for automatically learning what words mean (at least to some extent) from huge amounts of text -- for example, from all the text that people have made available on the web. And we discuss ways of representing the meaning of words and sentences in such a way that we can use them in language technology tasks.

We will look at two different kinds of meaning representations. Distributional representations, also called embeddings, characterize the meaning of a word or passage as an object in a "meaning space" that is learned automatically from data, in such a way that words with similar meanings will be close together in space. Logical representations translate sentences into a formal representations of the things, people and events that are mentioned and the connections between them, so that we can automatically reason with them and draw conclusions.

Prerequisites: Upper-division standing. LIN350 Analyzing Linguistic Data, or a different introduction to programming, or consent of instructor.

Readings will be made available for download from the course website.

Flags: Quantitatve, Independent Inquiry

Syllabus

Course overview

Instructor and TA contact information

    • Instructor: Katrin Erk

      • Office hours: Tuesday 1-3pm, Friday 11-12pm. Office hours will be via Zoom, the link is on Canvas.

      • Office: RLP 4.734

      • email: katrin dot erk at utexas dot edu

    • Teaching Assistant: Hongli Zhan

      • Office hours: 2:30pm-4pm, Mondays and Fridays. Office hours will be via Zoom, the link is on Canvas.

      • email: honglizhan at utexas dot edu

Prerequisites

  • Upper-division standing.

  • LIN350 Analyzing Linguistic Data, or a different introduction to programming, or consent of instructor.

Syllabus and text

This page serves as the syllabus for this course.

There is no textbook for this course.

Readings will be made available for download from the course website, in the Schedule section.

Content overview

Semantics is a very active area of computational linguistics -- but also a very diverse one. People work on word sense, semantic roles, selectional preferences, logic-based semantics, as well as on many semantics-related tasks and task-specific semantic representations. But there are problems that come up again and again in different tasks, and representation ideas that come up again and again in different variants. In this course, we focus on two influential classes of representations: structured (logic-based) semantics and distributional semantics, and on central phenomena that they address.

This course focuses on two frameworks in semantics, distributional models and logic-based semantics. Topics include:

  • Embeddings / Distributional representations:

    • Using distributional representations to analyze similarity in meaning

    • Making embeddings by counting words or using neural networks

  • Logic-based semantics:

    • Translating natural language sentences to logic

    • Automatic reasoning

    • Automatically constructing the logical representation of a sentence: semantics construction

  • Structured semantic representations more generally:

    • Tasks: word sense disambiguation, semantic role labeling, coreference, and so on

    • Variants of structured representation

    • Knowledge graphs, and their integration with embeddings

A detailed schedule for the course, with topics for each lecture, is available in the Schedule section.

Flags

This course carries the Quantitative Reasoning flag. Quantitative Reasoning courses are designed to equip you with skills that are necessary for understanding the types of quantitative arguments you will regularly encounter in your adult and professional life. You should therefore expect a substantial portion of your grade to come from your use of quantitative skills to analyze real-world problems.

This course also carries the Independent Inquiry flag. Independent Inquiry courses are designed to engage you in the process of inquiry over the course of a semester, providing you with the opportunity for independent investigation of a question, problem, or project related to your major. You should therefore expect a substantial portion of your grade to come from the independent investigation and presentation of your own work. See the Course Project section for details.

Course requirements and grading policy

  • Assignments: 60% (4 assigments, 15% each)

  • Course project:

    • Initial project description: 5%

    • Intermediate project report: 10%

    • Course presentation: 5%

    • Final report: 20%

Course projects should be done by teams of 2 students. Projects done by 1 or 3 students are only possible with prior approval of the instructor.

Project presentations will be in the final week of classes, in the order given on the schedule page (which will be generated via Python's random.shuffle()). If possible, all members of a project team should get some time to speak.

Assignments will be updated on Canvas. There will be 4 assignments. A tentative schedule for the entire semester is posted in the Schedule section. Readings may change up one week in advance of their due dates.

This course does not have a midterm or final exam.

Options for course projects, and more details on the project requirements are listed in the Project section.

The course will use plus-minus grading, using the following scale (showing Grade and Percentage):

  • A >= 93%

  • A- >= 90%

  • B+ >= 87%

  • B >= 83%

  • B- >= 80%

  • C+ >= 77%

  • C >= 73%

  • C- >= 70%

  • D+ >= 67%

  • D >= 63%

  • D- >= 60%

Attendance is not required. However, given that we will do a lot of hands-on exercises in class, and the homework assignments and the project address the material covered in class, good attendance is essential for doing well in this class.

Extension Policy

If you turn in your assignment late and we have not agreed on an extension beforehand, expect points to be deducted. Extensions will be considered on a case-by-case basis. I urge you to let me know if you are in need of an extension, such that we can make sure that you get the time necessary to complete the assignments.

If an extension has not been agreed on beforehand, then for assignments, by default, 5 points (out of 100) will be deducted for lateness, plus an additional 1 point for every 24-hour period beyond 2 that the assignment is late.

Note that there are always some points to be had, even if you turn in your assignment late. The last day in the semester on which the class meets is the last day to turn in late assignments for grading. Homework assignment submitted after that date will not be graded.

Classroom safety and Covid-19

To help preserve our in person learning environment, the university recommends the following.

Academic Dishonesty Policy

You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. Please contact the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

Emergency Evacuation Policy

Occupants of buildings on The University of Texas at Austin campus are required to evacuate buildings when a fire alarm is activated. Alarm activation or announcement requires exiting and assembling outside. Familiarize yourself with all exit doors of each classroom and building you may occupy. Remember that the nearest exit door may not be the one you used when entering the building. Students requiring assistance in evacuation shall inform their instructor in writing during the first week of class. In the event of an evacuation, follow the instruction of faculty or class instructors. Do not re-enter a building unless given instructions by the following: Austin Fire Department, The University of Texas at Austin Police Department, or Fire Prevention Services office. Information regarding emergency evacuation routes and emergency procedures can be found at http://www.utexas.edu/emergency.

Senate Bill 212 and Title IX Reporting Requirements

Under Senate Bill 212 (SB 212), the professor and TAs for this course are required to report for further investigation any information concerning incidents of sexual harassment, sexual assault, dating violence, and stalking committed by or against a UT student or employee. Federal law and university policy also requires reporting incidents of sex- and gender-based discrimination and sexual misconduct (collectively known as Title IX incidents). This means we cannot keep confidential information about any such incidents that you share with us. If you need to talk with someone who can maintain confidentiality, please contact University Health Services (512-471-4955 or 512-475-6877) or the UT Counseling and Mental Health Center (512-471-3515 or 512-471-2255). We strongly urge you make use of these services for any needed support and that you report any Title IX incidents to the Title IX Office.


Adapting the class format to deal with the ongoing pandemic

Here is the plan as of September 15, 2021:

  • The first three weeks of class will be fully online. We meet on zoom. The links are on the class Canvas page. Email me if you cannot access it.

  • The following two weeks, Wednesday classes are in person, on September 22 and September 29. Monday and Friday classes are on zoom.

  • Starting October 4, all classes will be offered in person. All in-person classes will be zoom-streamed and zoom-recorded. You can either come in person or participate via zoom, either way is fine. We'll continue to make class zoom recordings available.

Schedule


This schedule is subject to change.

Assignments are due at the end of the day on their due date. Please submit assignments online on Canvas unless the assignment tells you otherwise.

Readings can be done either before or after class (unless noted otherwise); they are chosen to support the material covered in class.

Week 1: Aug 25, Aug 27

  • Wednesday: Computational semantics: an overview

  • Friday: Meaning as a space in which you can walk from word to word: an introduction

Week 2: Aug 30, Sep1, Sep 3


  • Wednesday: Pre-computed meaning spaces, and how to use them

  • Friday: What can I do for my course project?

Week 3: Sep 6, 8, 10

Week 4: Sep 13, 15, 17

  • Monday: How to make a count-based space and use it

  • Wednesday: Continuing with count-based spaces: using matrix methods for efficiency, and doing dimensionality reduction

  • Friday: Towards prediction-based spaces. Step 1: Classification
    Class on zoom.

Week 5: Sep 20, 22, 24

Week 6: Sep 27, Sep 29, Oct 1

Week 7: Oct 4, 6, 8

Week 8: Oct 11, 13, 15

  • Monday: Structured meaning representations: word senses and ontologies

Week 9: Oct 18, 20, 22

    • Monday: Structured meaning representations: Events: arguments, subevents, coreference

      • Additional reading: slideset on Canvas

Week 10: Oct 25, 27, 29

      • Monday: Logic and automatic inference

        • Here is an online demo of Robinson Resolution: https://logictools.org/prop.html
          To use it, choose "Propositional logic" from the riders on top. Then in the first row of choices below the example window, choose using: "resolution:naive" showing "html trace". Then hit the "Solve" button.

Week 11: Nov 1, 3, 5

  • Monday: Translating natural language to logic, continued

  • Wednesday: Translating natural language to logic, continued

    • Homework 3 due.

  • Friday: Translating natural language to logic, continued

Week 12: Nov 8, 10, 12

  • Monday: Semantics construction: automatically constructing a logical representation for a sentence. Lambda calculus: lego for semantics construction

  • Wednesday: More lambda calculus. We go through some examples together.

    • Intermediate project report due.

  • Friday: Final piece of lambda calculus: transitive verbs. Then: Semantics construction in practice with the Natural Language Toolkit

Week 13: Nov 15, 17, 19

Week 14: Nov 22

  • Monday: Knowledge graphs

    • Homework 4 due.

  • Wednesday, Friday: Thanksgiving break

Week 15: Nov 29, Dec 1, Dec 3

  • Monday: Knowledge graphs

  • Wednesday: Knowledge graphs: construction through Information Extraction, extension through link prediction (using embeddings of graph nodes and edges!)

  • Friday: Project presentations

    • 10:00 John Steinman

    • 10:07 Aubrey Hinchman and Grey Sandstrum

    • 10:14 Sebastian Mancha and Grace Huang

    • 10:21 Sydney Willett and Kabin Moon

    • 10:28 Riddhi Bhave and Pooja Chivukula


Week 16: Dec 6

  • Monday: Project presentations, and some final words

    • 10:00 Alyssa Cantu

    • 10:07 Matthew Pabst

    • 10:14 Fei Guo

    • 10:21 Katrina Gavura and Nicolette Warren

    • 10:28 Misty Peng and Manaasa Darisi

    • 10:35 Vittoria Byland


Final report due: Friday December 10, end of day

Course Project


Course projects should be done by teams of 2 students. Project groups consisting of 1 or 3 students are possible only with prior approval of the instructor.

Course project requirements

Initial project description

This is a 1-2 page document (single-spaced, single column) that describes what your project will be about. It needs to contain the following information:

    • Research questions: What are the main questions that you want to answer, the main language phenomena you want to address, or the main ideas you want to explore?

    • Method: What distributional model will you use, or what kinds of rules are you planning to state? Be as detailed as you can. (Yes, I know you will not have worked out every detail at this point, but strive to work out as many as you can.)

    • Data: If you do a distributional project, it is vital that you figure out as early as possible what data you can use to learn your model. Is there enough data? Is it freely available? Do you have to contact someone to get it?

Intermediate report

This is a 1-2 page document (single-spaced, single column) that describes what the status of your project is at this point. This is a revised version of your initial project description. It needs to contain the following information:

    • Research questions: any changes?

    • Method: any changes?

    • Status:

      • Describe the data that was obtained: source, size, anything else that is relevant

      • Describe at least two (smaller, and preliminary) concrete results that you have at this point

You also need to take into account the feedback that you got on the Initial project description.

Short presentation

This is a short presentation to the class. You should discuss:

    • Research questions/linguistic phenomena/main ideas you wanted to model

    • Why is this relevant? (Spend a lot of time on the research questions and their relevance. Describing the big picture is important!)

    • Data, if you are using a data-driven approach: source, size

    • Results

You will need to prepare slides for this, which you submit to the instructor ahead of time.

Final report

This is a 5-6 page document (single-spaced, single column) that describes the results of your project. This is a revised version of your intermediate project description. It needs to contain the following information:

    • Research questions/linguistic phenomena covered/main ideas pursued

    • Data: source, size, other relevant statistics

    • Method

    • Findings

If you build on previous work, you need to discuss it, and give references. Published papers (at conferences, in journals) go into the references list at the end of the paper. Links to blog posts and the like go in a footnote. Also, links to websites containing data go in a footnote, not in the references list.

You need to take into account the feedback that you got on the Initial project description and Intermediate report.

Course project ideas

Context-based vectors/embeddings for words

Use pre-trained:

  • Comparing general and specific terms (hyponyms and hypernyms) in vector spaces

  • Exploring prejudice in vector spaces, and possibly removing it

  • Exploring analogy reasoning in vector spaces

  • Make vectors for occurrences of words, and group (cluster) them into senses

  • What clusters of words (clustered by vector representations) are used a lot in a politician's speech, or in top-10 songs?

  • Comparing general and specific words (like "animal" versus "dog") in vector spaces: can you detect which specific words go with which general words? How well does this work in different spaces?

Compute your own:

  • How do people use emojis? That is, what are the context vectors of emojis?

  • Compute vector representations from two different time periods: How have word meanings changed? Or, how has the discourse/use around the words changed?

  • Compute vector representations from two different corpus collections, and do the same kind of analysis

Topic modeling for documents

  • Automatically determine topics (word groups) that occur a lot in a collection of documents. Can you see patterns in which documents tend to have which topics?

Structured meaning representations

  • Build a system for automatic word sense disambiguation or semantic role labeling using machine learning

  • Build a system that automatically identifies events in text, using a tool that gives you the syntactic structure of a sentence and using rules that identify events in that syntactic structure

  • Build a system that automatically identifies medication names, or illness names, in medical texts


Links and additional readings

Tutorials and texts about distributional models


"Semantic directions" in distributional spaces

Readings:


Pre-computed spaces ready for use

Links that may give you ideas for course projects

  • SemEval, a series of workshops on semantics-related tasks: They come up with 10-ish new tasks every year, and they offer data for it, so this may be an easy way to get data

  • Automatically getting a structured meaning representation: See the Semantic Role Labeling demo and the Open Information Extraction demo from AllenNLP.

  • How can we use computational tool to answer linguistic questions? For ideas (with much too large projects!) for what others have done, see the Society for Computing in Linguistics conferences

Logic-based semantics in the Natural Language Toolkit

Some freely available corpora

Data that can be used to build distributional models:

  • The WaCKy corpora, including UKWaC (English web text, 2B words), Wackypedia (an English Wikipedia dump, 2B words), web corpora for French, German, and Italian. Ask me about a parsed version of UKWaC and Wackypedia if you need syntactic analysis.

Structured semantic annotation:

Systems and online demos for logic-based semantic analysis

Systems and online demos for structured semantics and syntactic preprocessing

Knowledge graphs

Additional readings about logic-based computational semantics

An in-depth overview of everything:

Practical guides to building logic-based semantics:

Focusing on the theory:

    • L.T.F Gamut: Logic, Language, and Meaning (2 volumes). Volume 1