Distributional spaces represent the meanings of words (or word occurrences) as points in some "semantic space" where similar meanings are close together, and different meanings are further apart.
This semantic space is computed from data, often just from written texts produced by many people. So we can view the semantic space as a kind of "compressed corpus", a record of utterances from many speakers. The structure of the space is determined by regularities in word co-occurrences across the utterances. We can probe this space to ask questions about lexical semantics.
In this course, we focus on distributional spaces as a "compressed corpus", and how we can use it for linguistic analyses. We discuss the ideas and the mathematics behind these models, recent use cases in linguistics, and we talk about a best practice in using these models, as far as it already exists.
An introduction to distributional models
This is a general introduction to the main underlying ideas of distributional models.
Particularly relevant readings (for after class, to reinforce what we discussed):
Methods for working with distributional models
The main methods for working with distributional models (focusing on methods that work both for word vectors and word token vectors)
Recommended readings: no single paper that stands out, but check the slides for many recommendations.
Using neural networks to compute distributional models
How do prediction-based models work, both at the word type level and at the word token level?
Helpful readings (recommended after class, to reinforce what we discussed):
Using word token embeddings
Recent neural models give us access to embeddings (vectors) not just for a word, but for a word in a particular sentence context (a word token). What can we do with that? What new kinds of studies are now being done?
Recommended readings (recommended after class, to review what we discussed):
An overview of where we are at with word token models and semantics:
Readings on technical problems with the semantic space in Transformer language models:
Readings on embedding clusters and interpretable features:
The nature of meaning, and whether to really "understand" you need a denotation, a world:
Here are some additional notebooks that could be useful, though we won't have time to go through them in class. There are: