Text is everywhere, in huge amounts: Books, emails, web pages, scientific papers, and so on. To be able to use the information laid down in all this text, we need technology that can help us manage, understand, sort, and make sense of all the information, for example: Automatically translating texts from one language to another; building better search engines that can deal with complex questions instead of just keywords; figuring out automatically whether the blogs are saying good or bad things about a particular product; extracting useful facts from repositories of scientific papers about medicine.
Computational linguistics is about using mathematical and computational methods to better describe how language works. It is about developing algorithms for automatic language understanding. And it is about building language technology applications. As you can see, computational linguistics spans a wide range of questions, from linguistics to computer science. It also draws on other fields such as cognition, and philosophy of language.
This course provides an introduction to the key methods that we use in computational linguistics, and it discusses some of the main applications.
The course is oriented towards students with some background in linguistics but no prior experience in programming and computer science. The main focus will be on hands-on experience of language processing techniques, with a gentle and thorough introduction to programming and to the relevant theoretical concepts. We will use the Python programming language and to using the Natural Language Toolkit (http://www.nltk.org/), a Python package made for exploring and processing language data.
Prerequisites: Upper-division standing.
Textbook: Jurafsky, D. and J. H. Martin, Speech and language processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd Edition). Prentice-Hall, 2008.
Additional required readings will be made available for download from the course website.