Project: LIN350 Computational semantics



Course Project Information

See below for examples of previous course projects!

Course projects should be done by teams of 2 students. Project groups consisting of 1 or 3 students are possible only with prior approval of the instructor.

Initial project description

This is a 1-2 page document (single-spaced, single column) that describes what your project will be about. It needs to contain the following information:

  • Research questions: What are the main questions that you want to answer, the main language phenomena you want to address, or the main ideas you want to explore?
  • Method: What distributional model will you use, or what kinds of rules are you planning to state? Be as detailed as you can. (Yes, I know you will not have worked out every detail at this point, but strive to work out as many as you can.)
  • Data: If you do a distributional project, it is vital that you figure out as early as possible what data you can use to learn your model. Is there enough data? Is it freely available? Do you have to contact someone to get it? 

Intermediate report

This is a 1-2 page document (single-spaced, single column) that describes what the status of your project is at this point. This is a revised version of your initial project description. It needs to contain the following information:

  • Research questions: any changes?
  • Method: any changes?
  • Status:
    • Describe the data that was obtained: source, size, anything else that is relevant
    • Describe at least two (smaller, and preliminary) concrete results that you have at this point

You also need to take into account the feedback that you got on the Initial project description.

Short presentation

This is a short presentation to the class. You should discuss:

  • Research questions/linguistic phenomena/main ideas you wanted to model
  • Why is this relevant? (Spend a lot of time on the research questions and their relevance. Describing the big picture is important!)
  • Data, if you are using a data-driven approach: source, size
  • Results

You will need to prepare slides for this, which you submit to the instructor ahead of time.

Final report

This is a 4-5 page document (single-spaced, single column) that describes the results of your project. This is a revised version of your intermediate project description. It needs to contain the following information:

  • Research questions/linguistic phenomena covered/main ideas pursued
  • Data: source, size, other relevant statistics
  • Method
  • Findings

If you build on previous work, you need to discuss it, and give references. Published papers (at conferences, in journals) go into the references list at the end of the paper. Links to blog posts and the like go in a footnote. Also, links to websites containing data go in a footnote, not in the references list.

You need to take into account the feedback that you got on the Initial project description and Intermediate report. 


Course project ideas

Context-based vectors/embeddings for words

Use pre-trained:
  • Comparing general and specific terms (hyponyms and hypernyms) in vector spaces
  • Exploring prejudice in vector spaces, and possibly removing it
  • Exploring analogy reasoning in vector spaces
  • Make vectors for occurrences of words, and group (cluster) them into senses
  • What clusters of words (clustered by vector representations) are used a lot in a politician's speech, or in top-10 songs?
  • Comparing general and specific words (like "animal" versus "dog") in vector spaces: can you detect which specific words go with which general words? How well does this work in different spaces?

Compute your own:
  • How do people use emojis? That is, what are the context vectors of emojis?
  • Compute vector representations from two different time periods: How have word meanings changed? Or, how has the discourse/use around the words changed?
    • Compute vector representations from two different corpus collections, and do the same kind of analysis

Topic modeling for documents

  • Automatically determine topics (word groups) that occur a lot in a collection of documents. Can you see patterns in which documents tend to have which documents?

Logic

  • Encoding and solving a logic puzzle (using logical form)
  • Extending semantics construction, for example by adding relative clauses, prepositional phrases
  • How to resolve anaphora  (Sandy hit Kim. He was angry.Who is "he"?) in Discourse Representation Structures

Comments