LIN 389C: Research in Computational Linguistics

Overview

LIN389C is a research course for students who work in computational linguistics. It is aimed at students with advanced knowledge in natural language processing and machine learning techniques who are doing research in the area. In the course, we discuss current research by course participants, review foundational knowledge that is relevant to participants' research, and talk about big-picture issues and current research in field.

Syllabus

Course information:

  • LIN 389C Research in Computational Linguistics. Unique number: 40670

  • Fall 2021

  • Course time: Wednesdays 12-3.

  • Course location: Computational linguistics lab, RLP 4.422

  • Course organizer: Katrin Erk

    • Office hours: Tuesdays 1-3, Fridays 11-12, via zoom, the link is on Canvas.

  • Reach me at: katrin DOT erk AT utexas DOT edu

Course Purpose

To teach about, encourage, and give students time for research. Also to encourage discussion and collaboration among students interested in the same subfield.

Course Organization

There are six constituencies who will not be treated exactly equally in the course because their needs are different:

    • first-year students: need general context and help on first research projects and first-year papers

    • second-year students: need feedback on the research they have done, prepare for further research, and write second-year papers

    • third-year students: need to write their Qualifying Papers, write grant proposals

    • post-candidacy pre-proposal students: need to write and present their dissertation proposals

    • post-candidacy dissertation students: need to write dissertations, get feedback

    • students from other departments: needs will vary

Course Components

The course consists of two main parts: a research seminar, and discussion of ongoing student research.

    • Research seminar: This semester we focus on a recent papers in computational linguistics. Topics and readings will be given on the schedule page.

    • Typically one of the students in the class will be responsible for giving a short initial summary of the paper and for preparing some questions to get the discussion going.

    • Discussion of ongoing student research:

      • Round-table: short presentations by all participants about their current research. This will happen almost every week.

      • On-going research: longer presentations (30 minutes or an hour including discussion), students, faculty, auditors if they wish

      • Dissertation proposal presentations

      • Dissertation progress talks

      • Practice talks for conference presentations

Requirements

    • First-year students: Attend all classes/activities. Talk about research.

    • First semester, submit literature discussion sketch halfway through the semester (see schedule), submit literature discussion at end of semester.

    • Second semester, submit first-year paper draft halfway through the semester (see schedule), submit first-year paper at end of semester.

    • Second-year students: Attend all classes/activities. Talk about research.

    • First semester, submit research discussion draft halfway through the semester (see schedule), submit research discussion paper at end of semester.

    • Second semester, submit second-year paper draft halfway through the semester (see schedule), submit second-year paper at end of semester.

    • Third-year students: Attend all classes/activities. Talk about research.

    • First semester, submit QP proposal halfway through the semester (see schedule), submit QP progress report at end of semester.

    • Second semester, submit QP draft halfway through the semester (see schedule), submit QP at end of semester.

    • Post-candidacy, pre-proposal students: Attend all presentations. Talk about research.

    • Dissertation-writing students: Attend all presentations, give at least one presentation during semester on doctoral research).

    • Students from other departments: a course project, with 2 documents: intermediate report (2-3 pages), final report (8 pages), deadlines as given in the schedule.

Grading policy

Grading will be based on the course requirement listed above.

This course does not have a final exam or midterm exam.

The course will use plus-minus grading, using the following scale:

  • A >= 93%

  • A- >= 90%

  • B+ >= 87%

  • B >= 83%

  • B- >= 80%

  • C+ >= 77%

  • C >= 73%

  • C- >= 70%

  • D+ >= 67%

  • D >= 63%

  • D- >= 60%

FERPA and Class Recordings

Class recordings are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction by a student could lead to Student Misconduct proceedings.

Classroom safety and Covid-19

To help preserve our in person learning environment, the university recommends the following.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. Please contact the Division of Diversity and Community Engagement, Services for Students with Disabilities, 5121-471-6259.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

Emergency Evacuation Policy

Occupants of buildings on The University of Texas at Austin campus are required to evacuate buildings when a fire alarm is activated. Alarm activation or announcement requires exiting and assembling outside. Familiarize yourself with all exit doors of each classroom and building you may occupy. Remember that the nearest exit door may not be the one you used when entering the building. Students requiring assistance in evacuation shall inform their instructor in writing during the first week of class. In the event of an evacuation, follow the instruction of faculty or class instructors. Do not re-enter a building unless given instructions by the following: Austin Fire Department, The University of Texas at Austin Police Department, or Fire Prevention Services office. Information regarding emergency evacuation routes and emergency procedures can be found at http://www.utexas.edu/emergency.

Behavior Concerns Advice Line (BCAL)

If you are worried about someone who is acting differently, you may use the Behavior Concerns Advice Line to discuss by phone your concerns about another individual's behavior. This service is provided through a partnership among the Office of the Dean of Students, the Counseling and Mental Health Center (CMHC), the Employee Assistance Program (EAP), and The University of Texas Police Department (UTPD). Call 512-232-5050 or visit http://www.utexas.edu/safety/bcal


Adapting the class format to deal with the ongoing pandemic

Here is the plan as of August 23, 2021:

  • The first three weeks of class will be fully online.

  • After that, we move to fully in-person classes, or possibly partly in person and partly online, depending on what the university rules are at that point.

  • All classes will be streamed on zoom, and students who prefer to take the class fully online will be enabled to do so. I know some students need to stay online, and we will support that. I also know some students need in-person classes, and we will support that.

Schedule

In the first week, we will talk about topics to cover in this semester's class. Please bring suggestions for topics that are relevant to your research. A collection of topics suggested at the end of the previous semester is listed under Topics below.

Week 1: Aug 25

We talk in class about topics you would like to see covered in class. Please bring suggestions!

We also do a round-table in which we talk about research done over the summer.

Week 2: Sep 1: Machine learning and language models

We are reading: Schick and Schütze, It's not just size that matters: small language models are also few-shot learners.

Venkat is leading the discussion.

Week 3: Sep 8: Machine learning and language models

Aghajanyan, Zettlemoyer and Gupta, Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Hongli will lead the discussion.

Also, see below for some introductory material on deep learning.

Week 4: Sep 15: Machine learning and language models

Transformers and Hopfield networks:

Gabriella will lead the discussion.

Week 5: Sep 22: Machine learning and language models

We are reading Zhang, van de Meent and Wallace, Disentangling Representations of Text by Masking Transformers, a recent preprint.

Nafal is leading the discussion.

Week 6: Sep 29: Social computational linguistics


PNAS: cognitive distortions on the rise -- or maybe not? https://www.pnas.org/content/118/30/e2102061118 with response by Kyle


Venkat is leading the discussion.

Week 7: Oct 6: Social computational linguistics

Pennebaker: a reddit study, or narratives

Week 8: Oct 13: Social computational linguistics, computational linguistics and society

Foundation models

Week 9: Oct 20: Prompt engineering


Survey paper Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing, https://arxiv.org/pdf/2107.13586.pdf

Week 10: Oct 27: Prompt engineering


Asking the right questions: Brenden Lake and colleagues, https://link.springer.com/article/10.1007/s42113-018-0005-5

And linking this to Jessy’s work on question generation

Week 11: Nov 3: Narratives

story generation

Week 12: Nov 10: Narratives

story generation or event representation

Week 13: Nov 17: Open for emerging topics

... such as phonosyntax, specificity, attention, understudied languages

Week 14: No class, Thanksgiving break

Week 15: Dec 1: Open for emerging topics

... such as phonosyntax, specificity, attention, understudied languages

Introductory material on deep learning

Michael Nielsen's book Neural Networks and Deep Learning gives a very nice and accessible introduction to deep learning.

I also like the relevant chapters from the upcoming 3rd edition of the Jurafsky and Martin book:

Jay Alammar has some very nice illustrations of key ideas in neural models:

Machine learning:


Machine learning, language and cognition:

Computational creativity

http://computationalcreativity.net/iccc20/papers/ICCC20_Proceedings.pdf


Narratives:

  • Story generation:

    • https://www.aclweb.org/anthology/2020.coling-main.212.pdf

    • R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, and Y. Choi. Defending against neural fake news. In Advances in neural information processing systems, pages 9054–9065, 2019

    • Controllable generation:

      • S. Dathathri, A. Madotto, J. Lan, J. Hung, E. Frank, P. Molino, J. Yosin- ski, and R. Liu. Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Rep- resentations, 2019.

      • N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858, 2019.

    • Explicit content planning:

      • A. Fan, M. Lewis, and Y. Dauphin. Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), pages 889–898, 2018.

      • L. Yao, N. Peng, R. Weischedel, K. Knight, D. Zhao, and R. Yan. Plan- and-write: Towards better automatic storytelling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7378–7385, 2019.

      • B. Tan, Z. Yang, M. AI-Shedivat, E. P. Xing, and Z. Hu. Progressive generation of long text. arXiv preprint arXiv:2006.15720, 2020.

      • L. Martin, P. Ammanabrolu, X. Wang, W. Hancock, S. Singh, B. Harrison, and M. Riedl. Event representations for automated story generation with deep neural nets. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.

      • A. Fan, M. Lewis, and Y. Dauphin. Strategies for structuring story gen- eration. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2650–2660, 2019.

      • S. Goldfarb-Tarrant, T. Chakrabarty, R. Weischedel, and N. Peng. Con- tent planning for neural story generation with aristotelian rescoring. In Proceedings of the 2020 Conference on Empirical Methods in Natural Lan- guage Processing (EMNLP), pages 4319–4338, 2020.

      • E. Orbach and Y. Goldberg. Facts2story: Controlling text generation by key facts. In Proceedings of the 28th International Conference on Compu- tational Linguistics, pages 2329–2345, 2020.

  • Style transfer: https://arxiv.org/abs/2010.05700 and see also https://www.wordtune.com/

  • Tasks for narrative reasoning that go beyond narrative cloze

    • Nate Chambers' critique of narrative cloze

  • Event embeddings


Social computational linguistics:

  • characterizing language variation as a function of social groups: World well being project http://wwbp.org/

  • personas: Robin Cooper.

  • Natural language generation and personas, NLG and style

Metaphor processing:

Discourse and pragmatics

  • papers in cognition about discourse particles

  • discourse and narrative

Ethical AI

  • https://dl.acm.org/doi/pdf/10.1145/3313831.3376638

  • papers by Maria De Arteaga

  • pattern discrimination and ethics

  • questions from the Social Justice Committee: look at research methods and citation practices in the field from the point of view of discrimination/bias