CORBON 2017: Coreference Resolution Beyond OntoNotes, 4 Apr 2017, Valencia, Spain

Logic List Mailing Archive

CORBON 2017: Coreference Resolution Beyond OntoNotes

4 Apr 2017
Valencia, Spain
CORBON 2017: 2nd Workshop on Coreference Resolution Beyond OntoNotes to be
held at EACL 2017 (Valencia, Spain), April 4, 2017

More information:  <http://corbon.nlp.ipipan.waw.pl/> 
http://corbon.nlp.ipipan.waw.pl/

New submission deadline: January 23, 2017

Please also note that:

*       A small number of travel awards will be available to students who
have papers accepted to CORBON 2017.

*       You can still take part in the
<http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> shared task on
coreference resolution for German and Russian.

*       The submission link is available:
<https://www.softconf.com/eacl2017/corbon>
https://www.softconf.com/eacl2017/corbon.



Call for Papers

Many NLP researchers, especially those not working in the area of discourse
processing, tend to equate coreference resolution with the sort of
coreference that people did in MUC, ACE, and OntoNotes, having the
impression that coreference is a well-worn task owing in part to the large
number of coreference papers reporting results on the MUC/ACE/OntoNotes
coreference corpora. This is an unfortunate misconception: the previous
shared tasks on coreference resolution have largely focused on entity
coreference, which constitutes only one of the many kinds of coreference
relations that were discussed in theoretical and computational linguistics
in the past few decades. In fact, by focusing on entity coreference
resolution, NLP researchers have only scratched the surface of the wealth of
interesting problems in coreference resolution.



The first workshop on Coreference Resolution Beyond OntoNotes (
<http://corbon.nlp.ipipan.waw.pl/2016/> CORBON 2016), which was held in
conjunction with NAACL HLT 2016, sought to:

*       encourage work on under-investigated coreference resolution tasks as
well as coreference resolution in under-investigated languages and

*       provide a forum for coreference researchers to discuss and present
such work. The workshop was quite successful in achieving its goals: the
majority of the submissions focused on coreference resolution in
less-investigated languages, and more than half of the submissions focused
on under-investigated coreference tasks.



Building on the success of its previous edition, CORBON 2017 will include:

*       a special theme on knowledge-rich coreference resolution;

*       a shared task on coreference resolution in languages without
coreference-annotated data; and (3) a panel discussing the future research
directions for coreference resolution.



Topics

The workshop welcomes submissions describing both theoretical and applied
computational work on coreference resolution, especially for languages other
than English, less-researched forms of coreference and new applications of
coreference resolution. The submissions are expected to discuss theories,
evaluation, limitations, system development and techniques relevant to the
workshop topics. Topics of interest include but are not limited to the
following:

*       Coreference resolution for less-researched languages (e.g.,
annotation strategies, resolution modules and formal evaluation)

*       Evaluation of influence of language-specific properties such as lack
of articles, quasi-anaphora, ellipsis or complexity of reflexive pronouns to
coreference resolution

*       Representation of coreferential relations other than identity
coreference (e.g., bridging references, reference to abstract entities,
etc.)

*       Investigation of difficult cases of anaphora and coreference and
their resolution by resorting to e.g. discourse-based and pragmatic levels

*       Coreference resolution in noisy data (e.g. in speech and social
networks)

*       New applications of coreference resolution



Since progress in these under-explored coreference tasks is currently
limited in part by the scarcity of annotated corpora, papers that describe
the creation and annotation of corpora, especially those with
less-investigated coreference phenomena and those involving less-researched
languages, are particularly welcome. In addition, the program committee
members will be asked to give special attention to submissions that echo our
special theme on knowledge-rich coreference resolution, which, as mentioned
above, involves the use of sophisticated knowledge sources for coreference
resolution.



Shared Task

Previous shared tasks on coreference resolution (e.g., the
<http://stel.ub.edu/semeval2010-coref/> SemEval 2010 shared task Coreference
Resolution in Multiple Languages, the
<http://conll.cemantix.org/2011/introduction.html> CoNLL 2011 and
<http://conll.cemantix.org/2012/introduction.html> 2012 shared tasks)
operated in a setting where a large amount of training data was provided to
train coreference resolvers in a fully supervised manner. Our shared task
has a different goal: we are primarily interested in a low-resource setting.
In particular, we seek to investigate how well one can build a coreference
resolver for a language for which there is no coreference-annotated data
available for training.



With a rising interest in annotation projection, we hereby offer a
projection-based task which will facilitate the application of existing
coreference resolution algorithms to new languages. We believe that with
this exciting setting, the shared task can help promote the development of
coreference technologies that are applicable to a larger number of natural
languages than is currently possible.



This year we will focus on two languages: German and Russian. To mimic a
low-resource setting, no German or Russian coreference-annotated data will
be provided. Rather, to facilitate system development, the shared task
participants will be provided two versions of an English-German-Russian
parallel corpus: an unlabelled version and a labelled version. The labelled
version has the English side of the parallel corpus automatically
coreference-labelled using the Berkeley coreference resolver, which was
trained on the English OntoNotes corpus.



Participants will compete in two tracks:

1.     closed track: projection-based coreference resolution on German
and/or Russian. The only coreference-annotated training data that the
participants can use is the English OntoNotes corpus. Alternatively, they
can use any of the publicly-available coreference resolvers trained on
English OntoNotes. They can then use whatever parallel corpus and method
they prefer to project the English annotations into German/Russian and
subsequently train a new coreference resolver on the projected annotations.
Note that they do not have to use the provided English-German-Russian
parallel corpus.

2.     open track: coreference resolution on German and Russian with no
restriction on the kind of coreference-annotated data the participants can
use for training. For instance, they can label their own German/Russian
coreference data and use it to train a German/Russian coreference resolver,
or they can adopt a heuristic-based approach where they employ knowledge of
German/Russian to write coreference rules for these languages.



The participants can choose to take part in one or both tracks for one or
both languages. The systems will be run on the test data by the participants
who are required to send their outputs to the Shared Task Coordinator by
December 27th (CET).
<https://github.com/yuliagrishina/CORBON-2017-Shared-Task> Training data as
well as several additional resources are already available on
<http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> the shared task
page.



The evaluation will be done on a manually annotated German-Russian parallel
corpus. The guidelines used for the annotation of the corpus are quite
compatible with the OntoNotes guidelines for English (Version 6.0) in terms
of types of referring expressions that are annotated.



The exceptions are that we:

a)     handle only NPs and do not annotate verbs that are coreferent with
NPs,

b)     include appositions into the markable span and do not mark them as a
separate relation,

c)     annotate pronominal adverbs in German if they co-refer with an NP.



Please check our github repository
<https://github.com/yuliagrishina/CORBON-2017-Shared-Task>  for the complete
guidelines and sample annotations.  Similar to CoNLL 2012, we will compute a
number of existing scoring metrics - MUC, B-CUBED, CEAF and BLANC -  and use
the unweighted average of MUC, B-CUBED and CEAF scores (computed by
<http://conll.github.io/reference-coreference-scorers/> the official CoNLL
2012 scorer) to determine the winning system. We will not evaluate
singletons and we kindly ask the participants to exclude them from the
submitted data.



Submission instructions

We solicit previously unpublished work, presented either as long or short
papers, following the style guidelines for EACL 2017, produced with the
official LaTeX template (
<http://eacl2017.org/images/site/eacl-2017-template.zip>
http://eacl2017.org/images/site/eacl-2017-template.zip). To be included in
the final proceedings, accepted papers have to be made available both as
LaTeX sources and PDF.



Long papers should have at most 8 pages of content, not including
references. Short papers are limited to 4 pages of content, not including
references. There is no constraint on the size of the reference list.
Submissions should be anonymous and not disclose in any way the identity of
the author(s). Submissions should be made using the START system (
<https://www.softconf.com/eacl2017/corbon/>
https://www.softconf.com/eacl2017/corbon/).



Important dates

December 19, 2016: Evaluation data released

December 27, 2016: System outputs collected

January 6, 2017: Shared task results announced

January 16, 2017: Workshop paper / System description paper due date

February 11, 2017: Notification of acceptance

February 21, 2017: Camera-ready papers due date

April 4, 2017: Workshop date



Program Committee

Anders Björkelund, University of Stuttgart

Antonio Branco, University of Lisbon

Chen Chen, Apple

Dan Cristea, A. I. Cuza University of Iasi

Pascal Denis, MAGNET, INRIA Lille Nord-Europe

Sobha Lalitha Devi, AU-KBC Research Center, Anna University of Chennai

Yulia Grishina, University of Potsdam

Lars Hellan, Norwegian University of Science and Technology

Veronique Hoste, Ghent University

Yufang Hou, IBM

Ryu Iida, National Institute of Information and Communications Technology
(NICT), Kyoto

Ekaterina Lapshinova-Koltunski, Saarland University

Emmanuel Lassalle, Global Systematic Investors LLP, UK

Chen Li, Microsoft

Sebastian Martschat, Heidelberg University

Ruslan Mitkov, University of Wolverhampton

Costanza Navaretta, University of Copenhagen

Anna Nedoluzhko, Charles University in Prague

Michal Novak, Charles University in Prague

Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of
Sciences

Constantin Orasan, University of Wolverhampton

Massimo Poesio, University of Essex

Sameer Pradhan, cemantix.org and Boulder Learning Inc.

Sam Wiseman, Harvard University

Manfred Stede, University of Potsdam

Veselin Stoyanov, Facebook

Yannick Versley, Heidelberg University

Amir Zeldes, Georgetown University

Rob Voigt, Stanford University

Desislava Zhekova, Ludwig Maximilian University of Munich

Heike Zinsmeister, University of Hamburg



Workshop Organizers

Maciej Ogrodniczuk, Linguistic Engineering Group, Institute of Computer
Science, Polish Academy of Sciences

Vincent Ng, Computer Science Department, The University of Texas at Dallas



Shared Task Coordinator

Yulia Grishina, University of Potsdam



Best regards,

Maciej Ogrodniczuk and Vincent Ng



--
[LOGIC] mailing list
http://www.dvmlg.de/mailingliste.html
Archive: http://www.illc.uva.nl/LogicList/

provided by a collaboration of the DVMLG, the Maths Departments in Bonn and Hamburg, and the ILLC at the Universiteit van Amsterdam