Graded sense and usage annotation

The vast majority of work on word senses has relied on predefined sense inventories and an annotation schema where each word instance is tagged with the best fitting sense. We have examined the case for a graded notion of word meaning in two experiments, one which uses WordNet senses in a graded fashion, contrasted with the “winner takes all” annotation, and one which asks annotators to judge the similarity of two usages. We find that the graded responses correlate with annotations from previous datasets, but sense assignments are used in a way that weakens the case for clear cut sense boundaries. The responses from both experiments correlate with the overlap of paraphrases from the English lexical substitution task which bodes well for the use of substitutes as a proxy for word sense.

Publication:

Katrin Erk, Diana McCarthy and Nicholas Gaylord: Investigations on Word Senses and Word Usages. Proceedings of ACL 2009.

First round of annotation

Data

To download both the usage similarity annotation and the graded sense annotation datasets, please click here .

We would appreciate it if you could let us know that you downloaded the dataset. Just send us an email to: katrin.erk@utexas.edu

Annotation guidelines

Annotation guidelines for graded sense annotation

Annotation guidelines for usage similarity annotation

Second round of annotation

http://www.dianamccarthy.co.uk/downloads/WordMeaningAnno2012/

Related work: Word usage graphs

See also Dominik Schlechtweg's word usage graphs, which are graphs of usage similarity. 

He has also provided a script for transforming this dataset of usage similarity data into word usage graphs.