DEEPsem: Deep Natural Language Understanding with  Probabilistic Logic and Distributional Similarity

This material is based upon work supported by the National Science Foundation under Grant No. 1523637.

The web offers huge amounts of information, but that also makes it harder to find and extract relevant information. Natural  language processing has made huge strides in developing tools that extract information and automatically answer questions, often with relatively simple methods aimed at relatively superficial analysis. This project explores methods for a deeper analysis and detailed natural language understanding. Contemporary intelligent systems have long used logic to describe precisely what a sentence means and how its pieces connect. But this precision has a downside: Logic needs the data to exactly match its expectations, or it breaks down. This is problematic for applications like question answering because language is hugely variable. There are often many different ways to say the same thing, or to say things that are not exactly the same but similar enough to be relevant. This project combines logic with a technology that identifies words and passages that are  similar but not exact matches. Also, language often only implies things rather than stating them outright. The project handles this through a mechanism that draws conclusions that are likely but not 100% certain, and that states its level of confidence in a conclusion.

Being highly interdisciplinary, the project gives students insights into logic and inferences, as well as methods that determine word similarity based on occurrences in large amounts of text. This project also forges new links between computational and theoretical linguistics by transferring ideas in both directions. Through its combination of precision and approximation, this project paves the way for language technology that understands language more deeply and thus will enhance societally important applications such as information extraction and automatic question answering.

Tasks in natural language semantics are requiring increasingly complex and fine-grained inferences. This project pursues the dual hypotheses that (a) logical form is best suited for supporting such inferences, and that (b) it is necessary to reason explicitly about uncertain, probabilistic information at the lexical level. This project combines logical form representations of sentence meaning with weighted inference rules derived from distributional similarity. It uses Markov Logic Networks for probabilistic inference over logical form with weighted rules, testing on the task of Recognizing Textual Entailment. It also develops new methods for describing word meaning in context distributionally in a way that is amenable to determining lexical entailment.