4 Apr 2017
Valencia, Spain
CORBON 2017: 2nd Workshop on Coreference Resolution Beyond OntoNotes to be held at EACL 2017 (Valencia, Spain), April 4, 2017 More information: <http://corbon.nlp.ipipan.waw.pl/> http://corbon.nlp.ipipan.waw.pl/ Paper submission deadline: January 16, 2017 News! * A small number of travel awards will be available to students who have papers accepted to CORBON 2017. * <http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> Shared task data have been published - please take part! * Submission link available: <https://www.softconf.com/eacl2017/corbon> https://www.softconf.com/eacl2017/corbon. Call for Papers Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of coreference papers reporting results on the MUC/ACE/OntoNotes coreference corpora. This is an unfortunate misconception: the previous shared tasks on coreference resolution have largely focused on entity coreference, which constitutes only one of the many kinds of coreference relations that were discussed in theoretical and computational linguistics in the past few decades. In fact, by focusing on entity coreference resolution, NLP researchers have only scratched the surface of the wealth of interesting problems in coreference resolution. The first workshop on Coreference Resolution Beyond OntoNotes ( <http://corbon.nlp.ipipan.waw.pl/2016/> CORBON 2016), which was held in conjunction with NAACL HLT 2016, sought to: * encourage work on under-investigated coreference resolution tasks as well as coreference resolution in under-investigated languages and * provide a forum for coreference researchers to discuss and present such work. The workshop was quite successful in achieving its goals: the majority of the submissions focused on coreference resolution in less-investigated languages, and more than half of the submissions focused on under-investigated coreference tasks. Building on the success of its previous edition, CORBON 2017 will include: * a special theme on knowledge-rich coreference resolution; * a shared task on coreference resolution in languages without coreference-annotated data; and (3) a panel discussing the future research directions for coreference resolution. Topics The workshop welcomes submissions describing both theoretical and applied computational work on coreference resolution, especially for languages other than English, less-researched forms of coreference and new applications of coreference resolution. The submissions are expected to discuss theories, evaluation, limitations, system development and techniques relevant to the workshop topics. Topics of interest include but are not limited to the following: * Coreference resolution for less-researched languages (e.g., annotation strategies, resolution modules and formal evaluation) * Evaluation of influence of language-specific properties such as lack of articles, quasi-anaphora, ellipsis or complexity of reflexive pronouns to coreference resolution * Representation of coreferential relations other than identity coreference (e.g., bridging references, reference to abstract entities, etc.) * Investigation of difficult cases of anaphora and coreference and their resolution by resorting to e.g. discourse-based and pragmatic levels * Coreference resolution in noisy data (e.g. in speech and social networks) * New applications of coreference resolution Since progress in these under-explored coreference tasks is currently limited in part by the scarcity of annotated corpora, papers that describe the creation and annotation of corpora, especially those with less-investigated coreference phenomena and those involving less-researched languages, are particularly welcome. In addition, the program committee members will be asked to give special attention to submissions that echo our special theme on knowledge-rich coreference resolution, which, as mentioned above, involves the use of sophisticated knowledge sources for coreference resolution. Shared Task Previous shared tasks on coreference resolution (e.g., the <http://stel.ub.edu/semeval2010-coref/> SemEval 2010 shared task Coreference Resolution in Multiple Languages, the <http://conll.cemantix.org/2011/introduction.html> CoNLL 2011 and <http://conll.cemantix.org/2012/introduction.html> 2012 shared tasks) operated in a setting where a large amount of training data was provided to train coreference resolvers in a fully supervised manner. Our shared task has a different goal: we are primarily interested in a low-resource setting. In particular, we seek to investigate how well one can build a coreference resolver for a language for which there is no coreference-annotated data available for training. With a rising interest in annotation projection, we hereby offer a projection-based task which will facilitate the application of existing coreference resolution algorithms to new languages. We believe that with this exciting setting, the shared task can help promote the development of coreference technologies that are applicable to a larger number of natural languages than is currently possible. This year we will focus on two languages: German and Russian. To mimic a low-resource setting, no German or Russian coreference-annotated data will be provided. Rather, to facilitate system development, the shared task participants will be provided two versions of an English-German-Russian parallel corpus: an unlabelled version and a labelled version. The labelled version has the English side of the parallel corpus automatically coreference-labelled using the Berkeley coreference resolver, which was trained on the English OntoNotes corpus. Participants will compete in two tracks: 1. closed track: projection-based coreference resolution on German and/or Russian. The only coreference-annotated training data that the participants can use is the English OntoNotes corpus. Alternatively, they can use any of the publicly-available coreference resolvers trained on English OntoNotes. They can then use whatever parallel corpus and method they prefer to project the English annotations into German/Russian and subsequently train a new coreference resolver on the projected annotations. Note that they do not have to use the provided English-German-Russian parallel corpus. 2. open track: coreference resolution on German and Russian with no restriction on the kind of coreference-annotated data the participants can use for training. For instance, they can label their own German/Russian coreference data and use it to train a German/Russian coreference resolver, or they can adopt a heuristic-based approach where they employ knowledge of German/Russian to write coreference rules for these languages. The participants can choose to take part in one or both tracks for one or both languages. The systems will be run on the test data by the participants who are required to send their outputs to the Shared Task Coordinator by December 27th (CET). <https://github.com/yuliagrishina/CORBON-2017-Shared-Task> Training data as well as several additional resources are already available on <http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> the shared task page. The evaluation will be done on a manually annotated German-Russian parallel corpus. The guidelines used for the annotation of the corpus are quite compatible with the OntoNotes guidelines for English (Version 6.0) in terms of types of referring expressions that are annotated. The exceptions are that we: a) handle only NPs and do not annotate verbs that are coreferent with NPs, b) include appositions into the markable span and do not mark them as a separate relation, c) annotate pronominal adverbs in German if they co-refer with an NP. Please check our github repository <https://github.com/yuliagrishina/CORBON-2017-Shared-Task> for the complete guidelines and sample annotations. Similar to CoNLL 2012, we will compute a number of existing scoring metrics - MUC, B-CUBED, CEAF and BLANC - and use the unweighted average of MUC, B-CUBED and CEAF scores (computed by <http://conll.github.io/reference-coreference-scorers/> the official CoNLL 2012 scorer) to determine the winning system. We will not evaluate singletons and we kindly ask the participants to exclude them from the submitted data. Submission instructions We solicit previously unpublished work, presented either as long or short papers, following the style guidelines for EACL 2017, produced with the official LaTeX template ( <http://eacl2017.org/images/site/eacl-2017-template.zip> http://eacl2017.org/images/site/eacl-2017-template.zip). To be included in the final proceedings, accepted papers have to be made available both as LaTeX sources and PDF. Long papers should have at most 8 pages of content, not including references. Short papers are limited to 4 pages of content, not including references. There is no constraint on the size of the reference list. Submissions should be anonymous and not disclose in any way the identity of the author(s). Submissions should be made using the START system ( <https://www.softconf.com/eacl2017/corbon/> https://www.softconf.com/eacl2017/corbon/). Important dates December 19, 2016: Evaluation data released December 27, 2016: System outputs collected January 6, 2017: Shared task results announced January 16, 2017: Workshop paper / System description paper due date February 11, 2017: Notification of acceptance February 21, 2017: Camera-ready papers due date April 4, 2017: Workshop date Program Committee Anders Björkelund, University of Stuttgart Antonio Branco, University of Lisbon Chen Chen, Apple Dan Cristea, A. I. Cuza University of Iasi Pascal Denis, MAGNET, INRIA Lille Nord-Europe Sobha Lalitha Devi, AU-KBC Research Center, Anna University of Chennai Yulia Grishina, University of Potsdam Lars Hellan, Norwegian University of Science and Technology Veronique Hoste, Ghent University Yufang Hou, IBM Ryu Iida, National Institute of Information and Communications Technology (NICT), Kyoto Ekaterina Lapshinova-Koltunski, Saarland University Emmanuel Lassalle, Global Systematic Investors LLP, UK Chen Li, Microsoft Sebastian Martschat, Heidelberg University Ruslan Mitkov, University of Wolverhampton Costanza Navaretta, University of Copenhagen Anna Nedoluzhko, Charles University in Prague Michal Novak, Charles University in Prague Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences Constantin Orasan, University of Wolverhampton Massimo Poesio, University of Essex Sameer Pradhan, cemantix.org and Boulder Learning Inc. Sam Wiseman, Harvard University Manfred Stede, University of Potsdam Veselin Stoyanov, Facebook Yannick Versley, Heidelberg University Amir Zeldes, Georgetown University Rob Voigt, Stanford University Desislava Zhekova, Ludwig Maximilian University of Munich Heike Zinsmeister, University of Hamburg Workshop Organizers Maciej Ogrodniczuk, Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences Vincent Ng, Computer Science Department, The University of Texas at Dallas Shared Task Coordinator Yulia Grishina, University of Potsdam Best regards, Maciej Ogrodniczuk and Vincent Ng -- [LOGIC] mailing list http://www.dvmlg.de/mailingliste.html Archive: http://www.illc.uva.nl/LogicList/ provided by a collaboration of the DVMLG, the Maths Departments in Bonn and Hamburg, and the ILLC at the Universiteit van Amsterdam