Project: LIN350 Computational semantics
Course Project Information
See below for examples of previous course projects!
Course projects should be done by teams of 2 students. Project groups consisting of 1 or 3 students are possible only with prior approval of the instructor.
Initial project description
This is a 1-2 page document (single-spaced, single column) that describes what your project will be about. It needs to contain the following information:
Research questions: What are the main questions that you want to answer, the main language phenomena you want to address, or the main ideas you want to explore?
Method: What distributional model will you use, or what kinds of rules are you planning to state? Be as detailed as you can. (Yes, I know you will not have worked out every detail at this point, but strive to work out as many as you can.)
Data: If you do a distributional project, it is vital that you figure out as early as possible what data you can use to learn your model. Is there enough data? Is it freely available? Do you have to contact someone to get it?
This is a 1-2 page document (single-spaced, single column) that describes what the status of your project is at this point. This is a revised version of your initial project description. It needs to contain the following information:
Research questions: any changes?
Method: any changes?
Describe the data that was obtained: source, size, anything else that is relevant
Describe at least two (smaller, and preliminary) concrete results that you have at this point
You also need to take into account the feedback that you got on the Initial project description.
This is a short presentation to the class. You should discuss:
Research questions/linguistic phenomena/main ideas you wanted to model
Why is this relevant? (Spend a lot of time on the research questions and their relevance. Describing the big picture is important!)
Data, if you are using a data-driven approach: source, size
You will need to prepare slides for this, which you submit to the instructor ahead of time.
This is a 4-5 page document (single-spaced, single column) that describes the results of your project. This is a revised version of your intermediate project description. It needs to contain the following information:
Research questions/linguistic phenomena covered/main ideas pursued
Data: source, size, other relevant statistics
If you build on previous work, you need to discuss it, and give references. Published papers (at conferences, in journals) go into the references list at the end of the paper. Links to blog posts and the like go in a footnote. Also, links to websites containing data go in a footnote, not in the references list.
You need to take into account the feedback that you got on the Initial project description and Intermediate report.
Course project ideas
Context-based vectors/embeddings for words
Comparing general and specific terms (hyponyms and hypernyms) in vector spaces
Exploring prejudice in vector spaces, and possibly removing it
In this context, we looked at the paper Man is to Computer Programmer as Woman is to Homemaker? Debiasing word embeddings.
Exploring analogy reasoning in vector spaces
Make vectors for occurrences of words, and group (cluster) them into senses
What clusters of words (clustered by vector representations) are used a lot in a politician's speech, or in top-10 songs?
Comparing general and specific words (like "animal" versus "dog") in vector spaces: can you detect which specific words go with which general words? How well does this work in different spaces?
Compute your own:
How do people use emojis? That is, what are the context vectors of emojis?
Compute vector representations from two different time periods: How have word meanings changed? Or, how has the discourse/use around the words changed?
Compute vector representations from two different corpus collections, and do the same kind of analysis
Topic modeling for documents
Automatically determine topics (word groups) that occur a lot in a collection of documents. Can you see patterns in which documents tend to have which documents?
Encoding and solving a logic puzzle (using logical form)
Extending semantics construction, for example by adding relative clauses, prepositional phrases
How to resolve anaphora (Sandy hit Kim. He was angry.Who is "he"?) in Discourse Representation Structures