LIN 389C Schedule

Course outline Fall 2019

This semester we have one meeting per week. Each week, unless noted otherwise, we use one half of the class to discuss the topic of the week, and we use the other half for a round-table discussing people's research.

I will soon post a list of topics for discussion for this semester at this place.

Week 1: Aug 30

  • Plan for the semester: Which topics to focus on, other activities
  • Extended round table:
    • Research results from the summer
    • Research plans for the fall, publication plans

Week 2: Sep 6

Computational semantics: a look at the data.

We are reading:
To keep things feasible, you are only asked to read one of the AMR papers. Please choose which. We will discuss both in class, but with the understanding that people have read only one of them.

Week 3: Sep 13

The Groningen Meaning Bank and Parallel Meaning Bank

We are reading:
  • A description of the Groningen Meaning Bank, with some background on DRT:
    Johan Bos, Valerio Basile, Kilian Evang, Noortje Venhuizen, Johannes Bjerva (2017): The Groningen Meaning Bank. In: Nancy Ide and James Pustejovsky (eds): Handbook of Linguistic Annotation, pp 463–496, Berlin: Springer
Additional reading (not required, but we'll discuss this too):
Websites -- do explore the data a bit:

Week 4: Sep 20

Decompositional semantics at Johns Hopkins and Rochester

We are reading:
We will also look at Venkat's recent work (congratulations!!):
Venkat will lead the discussion.

Week 5: Sep 27

Uses of in-depth meaning representations: AMR, and semantic proto-roles.

We are reading:
Pengxiang will lead the discussion.

Week 6: Oct 4

Semantic parsing

Gabriella will lead the discussion.

Week 7: Oct 11

Compositionality in machines

We are reading: Brenden M. Lake, Marco Baroni 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. Proceedings of ICML, https://arxiv.org/abs/1711.00350

Ankur will lead the discussion.

Week 8: Oct 18

Compositionality in machines

We are reading: Jacob Andreas 2019. Measuring Compositionality in Representation Learning. Proceedings of ICLR 2019, https://arxiv.org/abs/1902.07181

Su will lead the discussion.

Students who are doing a course project: please submit a short text (2 pages) describing the current status and plan for your paper. You can either hand me a hard copy in class, or submit via email to me.The text should contain the motivation, discussion of literature, planned data and planned model architecture. If you already have preliminary results, you can also put them, but you are not required to have preliminary results by then.

Week 9: Oct 25

Jason Baldridge visits our class! Here is what he will talk about:

Practical and Ethical Considerations in Demographic and Psychographic Analysis

Understanding people, how they implicitly and explicitly group, their linguistic patterns, what motivates them and more are all deeply interesting and long-standing questions. Industry and academic developers and researchers today have access to extensive information on people, but the data often lacks many of the core demographic and psychographic variables that pertain to many research questions and which drive some business functions (e.g. marketing). This is certainly true of social media profiles, which typically lack structured demographic information beyond names and locations---and even these are often incomplete or fabricated. As such, there has been a surge of academic and commercial interest in predicting values for gender, age, race, location, interests, personality, and more, given some portion of the information available in data about individuals, including social profiles, customer records, and more. In the past, efforts to study people were primarily localized to the researcher and the individuals they interacted with or requested surveys from. But today, these questions can be explored at massive scale, using the public and private digital exhaust we all create. Findings are no longer simply interpretive, but instead can be additionally translated into automated programs that analyze gender, personality and more. Such programs are informed by research in natural language processing, computer vision, psychology and related fields, and they can be used for positive, negative, and mixed ends. As researchers, we are arguably still waking up to this reality, and we cannot take a neutral stance regarding the potential benefits and harms of our work. We must grapple with hard questions around privacy rights and think actively and creatively about the wider societal implications and impacts of our work. In my talk, I'll discuss specific practical and ethical aspects of such work in the context of text, graph and image analysis for understanding demographics and psychographics, with a eye toward the potential for positive impact that reduces or minimizes risk to individuals.


Bio: Jason is a research scientist at Google, where he works on semantics, discourse and multilingual processing. He was previously an Associate Professor of Computational Linguistics at the University of Texas at Austin, and he co-founded People Pattern, a startup that delivers audience analytics for major brands. His main research interests include categorial grammars, parsing, semi-supervised learning for NLP, reference resolution and text geolocation. He has long been active in the creation and promotion of open source software for natural language processing: he is one of the co-creators of the Apache OpenNLP Toolkit and OpenCCG, and he has contributed to many others, including ScalaNLP, Junto, and TextGrounder. Jason received his Ph.D. from the University of Edinburgh in 2002, where his doctoral dissertation on Multimodal Combinatory Categorial Grammar was awarded the 2003 Beth Dissertation Prize from the European Association for Logic, Language and Information.

Week 10: Nov 1

Language models: BERT

We are reading:
Shrey will be leading the discussion.

Week 11: Nov 8

What do language models learn?

We are reading:


Elisa will be leading the discussion.

Week 12: Nov 15

Biases in Language models.

We are reading Mark Yatskar’s work on gender bias
Eric is leading the discussion.

Info: Yoav Artzi gives a FAI talk

Week 13: Nov 22

Ian Tenney visits our class! Here is what he will talk about:

Title: Probing for Structure in Sentence Representations


Abstract:

With the development of ELMo, BERT, and successors, pre-trained sentence encoders have become nearly ubiquitous in NLP. But what makes these models so powerful? What are they learning? A flurry of recent work - cheekily dubbed "BERTology" - seeks to analyze and explain these models, treating the encoder as an object of scientific inquiry.


In this talk, I'll discuss a few of these analyses, focusing on our own "edge probing" work which looks at how linguistic structure is represented in deep models. Using tasks like tagging, parsing, and coreference as analysis tools, we show that language models learn strong representations of syntax but are less adept at semantic phenomena. Moreover, we find evidence of sequential reasoning, reminiscent of traditional pipelined NLP systems.


This work was jointly conducted with Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick.


Bio:

Ian Tenney is a software engineer on the Language team at Google Research in Mountain View. His research focuses on understanding and analysis of deep NLP models, particularly on how they encode linguistic structure, and how unsupervised or weakly-supervised learning can give rise to complex representations and reasoning. He was a Senior Researcher on the sentence representation team for the 2018 JSALT workshop, and from 2016 to 2018 taught at the UC Berkeley School of Information. He holds an M.S. in Computer Science and a B.S. in Physics from Stanford.


Info: Jacob Andreas gives a FAI talk

Week 14: Nov 28

Thanksgiving break

Week 15: Dec 6


Ethics and NLP

We are reading:
Info: Dan Roth gives a FAI talk

Final course project papers due: tba.