Working with Corpora: Syllabus

Course Information

Instructor Contact Information

    • Katrin Erk

    • office hours: Tuesday 10-11, Wednesday 2:30-4:30

    • office: CLA 4.734

    • email: katrin dot erk at mail dot utexas dot edu


Graduate standing.

Syllabus and Text

This page serves as the syllabus for this course.

There is no course textbook. Readings will be from the following resources available online:

Exams and Assignments

There will be four programming assignments and a project. For the project, there will be four separate parts that will be graded: a project proposal halfway through the semester, a progress report, an oral presentation on the project given during the final week of class, and a final report on the project. Evaluation will be based on the project and homeworks. There will be no exams.

Assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted on the schedule page. Readings and exercises may change up one week in advance of their due dates.

Philosophy and Goal

The goal of this course is to provide the student with the necessary tools and techniques for doing corpus-based studies and annotation projects. It will thus help prepare the student for doing original research using corpora.

Some specific goals of the course are to enable students to:

    • make an informed choice of data for a corpus study

    • specify an appropriate annotation format for an annotation project

    • understand and conduct evaluations of annotator performance

    • write programs for extracting and interpreting corpus data using the Python programming language

    • use the Natural Language Toolkit for automatic processing and analysis of corpus data

    • use search tools to extract occurrences of language phenomena from unannotated or annotated texts

    • take some first steps towards performing a quantitative analysis of some corpus phenomenon

    • complete a non-trivial corpus-linguistic project and write a report

Content Overview

This course provides an in-depth introduction to the construction and use of corpora for linguistic analyses. It will adress the following points:

    • Making corpora: Best practice for the collection of corpora.

    • Annotating corpora: We will discuss annotation formats, annotation guidelines, and annotation evaluation, and we will look at concrete examples of corpus annotation projects.

    • Searching corpora: Tools for extracting information from unannotated an annotated corpora.

    • Quantitative analysis: Some first pointers on statistical analyses of corpus phenomena. (There is a separate course offered by the linguistics department on analyzing linguistic data.)

The second topic of the course is an introduction to programming in Python. We will focus on using Python for corpus processing, including the use of the Natural Language Toolkit.

Detailed Course Content

Detailed course content is discussed on the Schedule page.

Course Requirements

    • Assignments (15% each): A series of 4 assignments will be given during the semester. Their purpose is to give you direct experience with the tools and techniques covered in class and the readings. Assignments will be done individually.

    • Project proposal draft (5%): Midway through the semester, you will propose a topic for your final project. There will be an opportunity to discuss your topic in advance during class. The proposal will be in written form and should be roughly 2 single-spaced pages.

    • Project progress report (5%): The progress report is mainly a revision of the proposal. It should take into account comments given on the proposal. Expect it to require significant rewriting, as opposed to just editing the proposal. In addition, it should include an update on progress to date.It should be roughly 2-3 single-spaced pages.

    • Project final report (20%): The final report builds on the progress report and presents the project results and conclusions. It should be roughly 8 single-spaced pages in length.

    • Project presentation (10%): Each student will give a presentation on his or her project.

Attendance is not factored into the grade, but will be very helpful in achieving the course goals, in particular as we will do extensive practical exercises in-class.

Extension Policy

If you turn in your assignment late, expect points to be deducted. Extensions will be considered on a case-by-case basis.

For assignments, by default, 5 points (out of 100) will be deducted for lateness, plus an additional 1 point for every 24-hour period beyond 2 that the assignment is late. For example, an assignment due at 2pm on Tuesday will have 5 points deducted if it is turned in late but before 2pm on Thursday. It will have 6 points deducted if it is turned in by 2pm Friday, etc.

Notify me in advance if you need an extension on a course requirement. The greater the advance notice of a need for an extension, the greater the likelihood of leniency.

Academic Dishonesty Policy

You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.

Notice about students with disabilities

The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. Please contact the Division of Diversity and Community Engagement, Services for Students with Disabilities, 512-471-6259,

Notice about missed work due to religious holy days

By UT Austin policy, you must notify me of your pending absence at least fourteen days prior to the date of observance of a religious holy day. If you must miss a class, an examination, a work assignment, or a project in order to observe a religious holy day, you will be given an opportunity to complete the missed work within a reasonable time after the absence.

Emergency Evacuation Policy

    • Occupants of buildings on The University of Texas at Austin campus are required to evacuate buildings when a fire alarm is activated. Alarm activation or announcement requires exiting and assembling outside.

    • Familiarize yourself with all exit doors of each classroom and building you may occupy. Remember that the nearest exit door may not be the one you used when ent ering the building.

    • Students requiring assistance in evacuation shall inform their instructor in writing during the first week of class.

    • In the event of an evacuation, follow the instruction of faculty or class instructors.

    • Do not re-enter a building unless given instructions by the following: Austin Fire Department, The University of Texas at Austin Police Department, or Fire Prevention Services office.

    • Link to information regarding emergency evacuation routes and emergency procedures can be found at:

Behavior Concerns Advice Line (BCAL)

If you are worried about someone who is acting differently, you may use the Behavior Concerns Advice Line to discuss by phone your concerns about another individual's behavior. This service is provided through a partnership among the Office of the Dean of Students, the Counseling and Mental Health Center (CMHC), the Employee Assistance Program (EAP), and The University of Texas Police Department (UTPD). Call 512-232-5050 or visit