************************************************************************ DATASETS ASSOCIATED WITH COLING-2014 PAPER "Empirical Analysis of Aggregation Methods for Collective Annotation" by Ciyang Qing, Ulle Endriss, Raquel Fernandez and Justin Kruger ************************************************************************ This directory contains the datasets on linguistic judgments collected via Amazon's Mechanical Turk (AMT) described and analysed in the paper: (1) Recognising Textual Entailment (RTE) (2) Preposition Sense Disambiguation (PSD) (3) Question Dialogue Acts (QDA) The first annotation dataset (RTE) has been collected by Snow et al. (2008); the gold standard annotation was created by Dagan et al. (2006). The other two annotation datasets are new. The PSD gold standard was created by Litkowski and Hargraves (2007) and the QDA gold standard by Jurafsky et al. (1997). ************************************************************************ SUMMARY OF BASIC PARAMETERS ************************************************************************ The following table provides an overview of some of the basic parameters of the three datasets: RTE PSD QDA #categories 2 3 4 #items 800 150 300 #annotators 164 45 63 #annotators/item 10 10 10 #items/HIT 20 15 10 ************************************************************************ FILES ASSOCIATED WITH DATASETS ************************************************************************ For each of the three datasets (RTE, PSD, QDA) we provide two files: (1) A CSV file with the annotations collected via AMT. Each row in the file corresponds to one individual annotation. There are four columns: - the AMT Worker ID (Annotator) - the ID of the data example (Item) - the worker label (Category) - the gold standard label (Gold) (2) An additional file with the items themselves. In the case of RTE, this is an XML file distributed via the PASCAL RTE Challenge website (http://pascallin.ecs.soton.ac.uk/Challenges/RTE/). In the case of PSD and QDA, these are HTML files we have generated from the gold standard annotations mentioned above. ************************************************************************ REFERENCES ************************************************************************ Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Machine Learning Challenges, volume 3944 of LNCS, pages 177-190. Springer-Verlag. Dan Jurafsky, Elizabeth Shriberg, and Debra Biasca. 1997. Switchboard SWBD-DAMSL shallow-discourse-function-annotation coderÕs manual, Technical Report TR 97-02, Institute for Cognitive Science, University of Colorado at Boulder. Kenneth C. Litkowski and Orin Hargraves. 2007. SemEval-2007 Task 06: Word-Sense Disambiguation of Prepositions. In Proc. 4th International Workshop on Semantic Evaluations (SemEval-2007). Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast---but is it good? Evaluating non-expert annotations for natural language tasks. In Proc. Conference on Empirical Methods in Natural Language Processing (EMNLP-2008), pages 254-263. ************************************************************************