Logic List Mailing Archive

CORBON 2017: Coreference Resolution Beyond OntoNotes

4 Apr 2017
Valencia, Spain

CORBON 2017: 2nd Workshop on Coreference Resolution Beyond OntoNotes to be
held at EACL 2017 (Valencia, Spain), April 4, 2017

More information:  <http://corbon.nlp.ipipan.waw.pl/> 
http://corbon.nlp.ipipan.waw.pl/

Paper submission deadline: January 16, 2017

News!

* A small number of travel awards will be available to students who have 
papers accepted to CORBON 2017.

* <http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> Shared task 
data have been published - please take part!

* Submission link available: <https://www.softconf.com/eacl2017/corbon> 
https://www.softconf.com/eacl2017/corbon.

Call for Papers

Many NLP researchers, especially those not working in the area of 
discourse processing, tend to equate coreference resolution with the sort 
of coreference that people did in MUC, ACE, and OntoNotes, having the 
impression that coreference is a well-worn task owing in part to the large 
number of coreference papers reporting results on the MUC/ACE/OntoNotes 
coreference corpora. This is an unfortunate misconception: the previous 
shared tasks on coreference resolution have largely focused on entity 
coreference, which constitutes only one of the many kinds of coreference 
relations that were discussed in theoretical and computational linguistics 
in the past few decades. In fact, by focusing on entity coreference 
resolution, NLP researchers have only scratched the surface of the wealth 
of interesting problems in coreference resolution.



The first workshop on Coreference Resolution Beyond OntoNotes ( 
<http://corbon.nlp.ipipan.waw.pl/2016/> CORBON 2016), which was held in 
conjunction with NAACL HLT 2016, sought to:

* encourage work on under-investigated coreference resolution tasks as 
well as coreference resolution in under-investigated languages and

* provide a forum for coreference researchers to discuss and present such 
work. The workshop was quite successful in achieving its goals: the 
majority of the submissions focused on coreference resolution in 
less-investigated languages, and more than half of the submissions focused 
on under-investigated coreference tasks.



Building on the success of its previous edition, CORBON 2017 will include:

* a special theme on knowledge-rich coreference resolution;

* a shared task on coreference resolution in languages without 
coreference-annotated data; and (3) a panel discussing the future research 
directions for coreference resolution.



Topics

The workshop welcomes submissions describing both theoretical and applied 
computational work on coreference resolution, especially for languages 
other than English, less-researched forms of coreference and new 
applications of coreference resolution. The submissions are expected to 
discuss theories, evaluation, limitations, system development and 
techniques relevant to the workshop topics. Topics of interest include but 
are not limited to the following:

* Coreference resolution for less-researched languages (e.g., annotation 
strategies, resolution modules and formal evaluation)

* Evaluation of influence of language-specific properties such as lack of 
articles, quasi-anaphora, ellipsis or complexity of reflexive pronouns to 
coreference resolution

* Representation of coreferential relations other than identity 
coreference (e.g., bridging references, reference to abstract entities, 
etc.)

* Investigation of difficult cases of anaphora and coreference and their 
resolution by resorting to e.g. discourse-based and pragmatic levels

* Coreference resolution in noisy data (e.g. in speech and social 
networks)

* New applications of coreference resolution



Since progress in these under-explored coreference tasks is currently 
limited in part by the scarcity of annotated corpora, papers that describe 
the creation and annotation of corpora, especially those with 
less-investigated coreference phenomena and those involving 
less-researched languages, are particularly welcome. In addition, the 
program committee members will be asked to give special attention to 
submissions that echo our special theme on knowledge-rich coreference 
resolution, which, as mentioned above, involves the use of sophisticated 
knowledge sources for coreference resolution.



Shared Task

Previous shared tasks on coreference resolution (e.g., the 
<http://stel.ub.edu/semeval2010-coref/> SemEval 2010 shared task 
Coreference Resolution in Multiple Languages, the 
<http://conll.cemantix.org/2011/introduction.html> CoNLL 2011 and 
<http://conll.cemantix.org/2012/introduction.html> 2012 shared tasks) 
operated in a setting where a large amount of training data was provided 
to train coreference resolvers in a fully supervised manner. Our shared 
task has a different goal: we are primarily interested in a low-resource 
setting. In particular, we seek to investigate how well one can build a 
coreference resolver for a language for which there is no 
coreference-annotated data available for training.



With a rising interest in annotation projection, we hereby offer a 
projection-based task which will facilitate the application of existing 
coreference resolution algorithms to new languages. We believe that with 
this exciting setting, the shared task can help promote the development of 
coreference technologies that are applicable to a larger number of natural 
languages than is currently possible.



This year we will focus on two languages: German and Russian. To mimic a 
low-resource setting, no German or Russian coreference-annotated data will 
be provided. Rather, to facilitate system development, the shared task 
participants will be provided two versions of an English-German-Russian 
parallel corpus: an unlabelled version and a labelled version. The 
labelled version has the English side of the parallel corpus automatically 
coreference-labelled using the Berkeley coreference resolver, which was 
trained on the English OntoNotes corpus.



Participants will compete in two tracks:

1.  closed track: projection-based coreference resolution on German and/or 
Russian. The only coreference-annotated training data that the 
participants can use is the English OntoNotes corpus. Alternatively, they 
can use any of the publicly-available coreference resolvers trained on 
English OntoNotes. They can then use whatever parallel corpus and method 
they prefer to project the English annotations into German/Russian and 
subsequently train a new coreference resolver on the projected 
annotations. Note that they do not have to use the provided 
English-German-Russian parallel corpus.

2.  open track: coreference resolution on German and Russian with no 
restriction on the kind of coreference-annotated data the participants can 
use for training. For instance, they can label their own German/Russian 
coreference data and use it to train a German/Russian coreference 
resolver, or they can adopt a heuristic-based approach where they employ 
knowledge of German/Russian to write coreference rules for these 
languages.



The participants can choose to take part in one or both tracks for one or 
both languages. The systems will be run on the test data by the 
participants who are required to send their outputs to the Shared Task 
Coordinator by December 27th (CET). 
<https://github.com/yuliagrishina/CORBON-2017-Shared-Task> Training data 
as well as several additional resources are already available on 
<http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> the shared task 
page.



The evaluation will be done on a manually annotated German-Russian 
parallel corpus. The guidelines used for the annotation of the corpus are 
quite compatible with the OntoNotes guidelines for English (Version 6.0) 
in terms of types of referring expressions that are annotated.



The exceptions are that we:

a)  handle only NPs and do not annotate verbs that are coreferent with 
NPs,

b)  include appositions into the markable span and do not mark them as a 
separate relation,

c)  annotate pronominal adverbs in German if they co-refer with an NP.



Please check our github repository 
<https://github.com/yuliagrishina/CORBON-2017-Shared-Task> for the 
complete guidelines and sample annotations.  Similar to CoNLL 2012, we 
will compute a number of existing scoring metrics - MUC, B-CUBED, CEAF and 
BLANC - and use the unweighted average of MUC, B-CUBED and CEAF scores 
(computed by <http://conll.github.io/reference-coreference-scorers/> the 
official CoNLL 2012 scorer) to determine the winning system. We will not 
evaluate singletons and we kindly ask the participants to exclude them 
from the submitted data.



Submission instructions

We solicit previously unpublished work, presented either as long or short 
papers, following the style guidelines for EACL 2017, produced with the 
official LaTeX template ( 
<http://eacl2017.org/images/site/eacl-2017-template.zip> 
http://eacl2017.org/images/site/eacl-2017-template.zip). To be included in 
the final proceedings, accepted papers have to be made available both as 
LaTeX sources and PDF.



Long papers should have at most 8 pages of content, not including 
references. Short papers are limited to 4 pages of content, not including 
references. There is no constraint on the size of the reference list. 
Submissions should be anonymous and not disclose in any way the identity 
of the author(s). Submissions should be made using the START system ( 
<https://www.softconf.com/eacl2017/corbon/> 
https://www.softconf.com/eacl2017/corbon/).



Important dates

December 19, 2016: Evaluation data released

December 27, 2016: System outputs collected

January 6, 2017: Shared task results announced

January 16, 2017: Workshop paper / System description paper due date

February 11, 2017: Notification of acceptance

February 21, 2017: Camera-ready papers due date

April 4, 2017: Workshop date



Program Committee

Anders Björkelund, University of Stuttgart
Antonio Branco, University of Lisbon
Chen Chen, Apple
Dan Cristea, A. I. Cuza University of Iasi
Pascal Denis, MAGNET, INRIA Lille Nord-Europe
Sobha Lalitha Devi, AU-KBC Research Center, Anna University of Chennai
Yulia Grishina, University of Potsdam
Lars Hellan, Norwegian University of Science and Technology
Veronique Hoste, Ghent University
Yufang Hou, IBM
Ryu Iida, National Institute of Information and Communications Technology
(NICT), Kyoto
Ekaterina Lapshinova-Koltunski, Saarland University
Emmanuel Lassalle, Global Systematic Investors LLP, UK
Chen Li, Microsoft
Sebastian Martschat, Heidelberg University
Ruslan Mitkov, University of Wolverhampton
Costanza Navaretta, University of Copenhagen
Anna Nedoluzhko, Charles University in Prague
Michal Novak, Charles University in Prague
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of
Sciences
Constantin Orasan, University of Wolverhampton
Massimo Poesio, University of Essex
Sameer Pradhan, cemantix.org and Boulder Learning Inc.
Sam Wiseman, Harvard University
Manfred Stede, University of Potsdam
Veselin Stoyanov, Facebook
Yannick Versley, Heidelberg University
Amir Zeldes, Georgetown University
Rob Voigt, Stanford University
Desislava Zhekova, Ludwig Maximilian University of Munich
Heike Zinsmeister, University of Hamburg

Workshop Organizers

Maciej Ogrodniczuk, Linguistic Engineering Group, Institute of Computer
Science, Polish Academy of Sciences

Vincent Ng, Computer Science Department, The University of Texas at Dallas



Shared Task Coordinator

Yulia Grishina, University of Potsdam



Best regards,

Maciej Ogrodniczuk and Vincent Ng

--
[LOGIC] mailing list
http://www.dvmlg.de/mailingliste.html
Archive: http://www.illc.uva.nl/LogicList/

provided by a collaboration of the DVMLG, the Maths Departments in Bonn and Hamburg, and the ILLC at the Universiteit van Amsterdam