Graded sense and usage annotation
The vast majority of work on word senses has relied on predefined sense inventories and an annotation schema where each word instance is tagged with the best fitting sense. We have examined the case for a graded notion of word meaning in two experiments, one which uses WordNet senses in a graded fashion, contrasted with the “winner takes all” annotation, and one which asks annotators to judge the similarity of two usages. We find that the graded responses correlate with annotations from previous datasets, but sense assignments are used in a way that weakens the case for clear cut sense boundaries. The responses from both experiments correlate with the overlap of paraphrases from the English lexical substitution task which bodes well for the use of substitutes as a proxy for word sense.
Publication:
Katrin Erk, Diana McCarthy and Nicholas Gaylord: Investigations on Word Senses and Word Usages. Proceedings of ACL 2009.
First round of annotation
Data
To download both the usage similarity annotation and the graded sense annotation datasets, please click here .
We would appreciate it if you could let us know that you downloaded the dataset. Just send us an email to: katrin.erk@utexas.edu
Annotation guidelines
Annotation guidelines for graded sense annotation
Annotation guidelines for usage similarity annotation
Second round of annotation
http://www.dianamccarthy.co.uk/downloads/WordMeaningAnno2012/
Related work: Word usage graphs
See also Dominik Schlechtweg's word usage graphs, which are graphs of usage similarity.
He has also provided a script for transforming this dataset of usage similarity data into word usage graphs.