Logic List Mailing Archive

GermEval 2015: LexSub (German Lexical Substitution Shared Task)

29 Sep 2015
Essen, Germany

==== Call for Participation ====

GermEval 2015: LexSub
(German Lexical Substitution Shared Task)

29 September 2015 at GSCL 2015, Essen, Germany

https://sites.google.com/site/germeval2015


---- Introduction ----

Word sense disambiguation (WSD) has been a core research problem in 
computational linguistics since the very inception of the field. In recent 
years, there has been considerable interest in using lexical substitution as an 
extrinsic evaluation of WSD systems. This has led to a number of mono- and 
crosslingual evaluation competitions at SemEval and EVALITA. We now invite all 
researchers and industry professionals to participate in GermEval 2015: LexSub, 
the first lexical substitution task for the German language.  The task is 
associated with the GSCL 2015 conference in Essen, and will take place as a 
workshop there on 29 September 2015.


---- Task description ----

Lexical substitution is the task of identifying an appropriate substitute for a 
target word in a given context. For example, in the sentence "She's a bright 
kid who excels academically," an appropriate substitute for "bright" might be 
"smart", whereas an inappropriate one would be "glowing". Automatically 
identifying substitution candidates, and selecting those which best match the 
context, requires intelligent application of lexical-semantic knowledge and 
word sense disambiguation techniques. However, unlike traditional WSD tasks, 
lexical substitution does not mandate the use of any particular sense 
inventory.

The data for the GermEval 2015: LexSub task is described by Cholakov et al. in 
"Lexical substitution dataset for German" (Proc. LREC, 2014). All together it 
consists of 2040 sentences from the German Wikipedia, each containing a target 
word and a list of substitutions proposed by human annotators. There are 153 
unique target words, equally distributed across parts of speech (nouns, verbs, 
and adjectives) and three frequency groups. About half of this data (26 nouns, 
26 verbs, and 26 adjectives in 1040 sentence contexts) forms the training set, 
which will be made available to participants in advance. The remainder forms 
the test set, which will be used for the evaluation and published in full only 
after the shared task is completed.

Participants need not rely on any particular language resources, but if they 
wish they can employ the sense-linked lexical-semantic resource UBY and 
JoBimText distributional semantics models. UBY also provides an interface to 
GermaNet. Industrial users will be eligible to a special GermaNet licence to be 
obtained from Eberhard-Karls Universität Tübingen. Please refer to our web 
pages on how to obtain the data sets and resources.

Systems' performance will be measured by comparing their substitutes against 
those selected by the human annotators; for this we will use the "best", "out 
of ten", and "generalized average precision" metrics. The organizers will 
provide a scoring system and the output of some baseline systems.


---- Practical information ----

* 23 January 2015: Availability of training data
* 1 July 2015: Availability of test data
* 15 July 2015: Deadline for initial submission of papers and results
* 1 August 2015: Notification of acceptance and shared task results
* 15 August 2015: Deadline for camera-ready papers
* 30 September?2 October 2015: GSCL 2015

Submissions will consist of a file providing the substitutions for each 
instance of the target data and a paper of up to four pages (including 
references) describing the approach and analyzing the performance. Papers 
should follow the GSCL 2015 style guide, and will be reviewed and published in 
an online volume of workshop proceedings. (We may ask participants to 
peer-review other submissions.) Participants are expected to present summaries 
of their systems at the GermEval 2015: LexSub workshop at GSCL 2015.


---- Organizing committee ----

* Sallam Abualhaija, Technische Universität Hamburg-Harburg
* Darina Benikova, LT Group, Technische Universität Darmstadt
* Chris Biemann, LT Group, Technische Universität Darmstadt
* Judith Eckle-Kohler, UKP Lab, Technische Universität Darmstadt
* Iryna Gurevych, UKP Lab, Technische Universität Darmstadt
* Tristan Miller, UKP Lab, Technische Universität Darmstadt

To contact the organizing committee, please post to the GermEval 2015: LexSub 
mailing list at https://groups.google.com/forum/#!forum/germeval-2015-lexsub, 
or for private communication e-mail Tristan Miller: 
miller@ukp.informatik.tu-darmstadt.de.


---- Acknowledgements ----

This shared task is supported by the DFG-funded project "Integrating 
Collaborative and Linguistic Resources for Word Sense Disambiguation and 
Semantic Role Labeling" (InCoRe, GU 798/9-1), the BMBF-funded CLARIN F-AG7, and 
the LOEWE research cluster "Digital Humanities".