LIN 392 Working with Corpora

Spring 2022 | Instructor: Katrin Erk | Tues/Thurs 12:30-2 | On zoom, and RLP 4.422 | Canvas

Course overview

Corpus linguistics is use of text corpora for exploring, documenting and modeling linguistic phenomena. This course provides a practical introduction to working with corpora.

The purpose of this course is to provide the student with a basic toolbox for working with corpora. The student will get to know current best practice in the construction and annotation of corpora, get to know search tools for locating occurrences of relevant phenomena in a corpus, and learn to use Python, a high-level programming language, to process text corpora. We will discuss examples of corpus-creation projects and formats for annotating corpora.

This course is designed for students with no prior experience in programming. Its aim is to enable students to perform their own corpus-based studies.

Graduate students from departments other than Linguistics are welcome to take this class.

Syllabus

Course Information

Instructor Contact Information

Prerequisites

Graduate standing.

Syllabus and text

This page serves as the syllabus for this course.

There is no course textbook. Readings will be made available through links in the course schedule below. 

Content overview

This course provides an in-depth introduction to the construction and use of corpora for linguistic analyses, and provides the student with a collection of tools for automatic analysis. 

By the end of this course, you will 

Course requirements and grading policy

Attendance is not required, but will be very helpful in achieving the course goals, in particular as we will do extensive practical exercises in-class.

The course will use plus-minus grading, using the following scale (showing Grade and Percentage):

Extension Policy

If you turn in your assignment late and we have not agreed on an extension beforehand, expect points to be deducted. Extensions will be considered on a case-by-case basis. I urge you to let me know if you are in need of an extension, such that we can make sure that you get the time necessary to complete the assignments.

If an extension has not been agreed on beforehand, then for assignments, by default, 5 points (out of 100) will be deducted for lateness, plus an additional 1 point for every 24-hour period beyond 2 that the assignment is late.

Note that there are always some points to be had, even if you turn in your assignment late. The last day in the semester on which the class meets is the last day to turn in late assignments for grading. Homework assignment submitted after that date will not be graded.

Classroom safety and Covid-19

To help preserve our in person learning environment, the university recommends the following.

Academic Dishonesty Policy

You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. Please contact the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259.

Notice about missed work due to religious holy days

A student who misses an examination, work assignment, or other project due to the observance of a religious holy day will be given an opportunity to complete the work missed within a reasonable time after the absence, provided that he or she has properly notified the instructor. It is the policy of the University of Texas at Austin that the student must notify the instructor at least fourteen days prior to the classes scheduled on dates he or she will be absent to observe a religious holy day. For religious holy days that fall within the first two weeks of the semester, the notice should be given on the first day of the semester. The student will not be penalized for these excused absences, but the instructor may appropriately respond if the student fails to complete satisfactorily the missed assignment or examination within a reasonable time after the excused absence.

Emergency Evacuation Policy

Occupants of buildings on The University of Texas at Austin campus are required to evacuate buildings when a fire alarm is activated. Alarm activation or announcement requires exiting and assembling outside. Familiarize yourself with all exit doors of each classroom and building you may occupy. Remember that the nearest exit door may not be the one you used when entering the building. Students requiring assistance in evacuation shall inform their instructor in writing during the first week of class. In the event of an evacuation, follow the instruction of faculty or class instructors. Do not re-enter a building unless given instructions by the following: Austin Fire Department, The University of Texas at Austin Police Department, or Fire Prevention Services office. Information regarding emergency evacuation routes and emergency procedures can be found at http://www.utexas.edu/emergency.

Senate Bill 212 and Title IX Reporting Requirements

Under Senate Bill 212 (SB 212), the professor and TAs for this course are required to report for further investigation any information concerning incidents of sexual harassment, sexual assault, dating violence, and stalking committed by or against a UT student or employee. Federal law and university policy also requires reporting incidents of sex- and gender-based discrimination and sexual misconduct (collectively known as Title IX incidents). This means we cannot keep confidential information about any such incidents that you share with us. If you need to talk with someone who can maintain confidentiality, please contact University Health Services (512-471-4955 or 512-475-6877) or the UT Counseling and Mental Health Center (512-471-3515 or 512-471-2255). We strongly urge you make use of these services for any needed support and that you report any Title IX incidents to the Title IX Office.


Adapting the class format to deal with the ongoing pandemic

Schedule

This schedule is subject to change.

Assignments are due at the end of the day on their due date. Please submit assignments online on Canvas unless the assignment tells you otherwise.

Readings can be done either before or after class (unless noted otherwise); they are chosen to support the material covered in class.

Week 1: Jan 18, 20:  This week fully online

Week 2: Jan 25, 27:  This week fully online

Part 1: Introduction to programming

Week 3: Feb 1, 3:  Tuesday session this week in person 

Week 4: Feb 8, 10: This week in person.

Week 5: Feb 15, 17: This week in person.

Week 6: Feb 22, 24: 

Part 2: Statistical analyses

Week 7: Mar 1, 3: 

Week 8: Mar 8, 10: 

Week 9: Spring Break

Week 10: Mar 22, 24: 

Week 11: Mar 29, 31: 

Part 3: Annotation

Week 12: Apr 5, 7: 

Part 4: Search

Week 13: Apr 12, 14: 

Week 14: Apr 19, 21: 

Part 5: Automatic linguistic analysis

Week 15: Apr 26, 28: 

Week 16: May 3, 5: Project presentations

Final report due: May 11 end of day.

Links and additional readings

List of software we will use in the class

Python and Python packages:

Alternatively, you can individually install:

Tips and tricks: 

Using Python:

Corpus design: