Logic List Mailing Archive

Postdoctoral position in data linking, Toulouse (France)

** Post-doctoral position at IRIT: Data Linking **

* Context: ANR project DACE-DL (DAta-CEntric AI-driven Data Linking)  *

Data linking is the scientific challenge of automatically establishing 
typed links between the entities of two or more structured datasets. A 
variety of complex data linking systems exists, evaluated on public 
benchmarks. While they have allowed for the generation of vast amounts of 
linked data in the context of various dedicated projects, data generic 
systems often have limited applicability in many real-world scenarios, 
where data are highly heterogeneous and domain-specific. DACE-DL targets a 
paradigm shift in the data linking field with a data-centric bottom-up 
methodology relying on machine learning and representation learning 
models. We hypothesize there exists a finite number of identifiable and 
generalisable linking problem types (LPTs), that we need to categorize and 
analyse to provide better linking results.

   * Topic: Data collect, consolidation, and data linking systems modularization  *

This research is articulated in two main tasks. The first task consists in 
(1) carrying out an in-depth analysis of the quality of the existing data 
linking datasets, identifying erroneous statements and providing a 
high-quality set of datasets by correcting those statements; and (ii) 
generating additional links using existing high-precision linking systems 
on the chosen datasets. Data quality metrics such as accuracy, consistency 
and conciseness will be considered.

The aim of the second task is manifold : (1) to provide an inventory of 
publicly available and functional linking tools that are able to deal with 
a large spectrum of data linking problem; (2) to propose a theoretical 
approach for the modularization of these tools into atomic modules easy to 
combine in order to build more complex solutions in a linking ecosystem; 
(3) to make the produced modules available to the data linking community. 
To do the modularization at scale, we plan to call upon unsupervised ML 
algorithms, enhanced by a human-in-the-loop approach. The objective is to 
provide a set of correspondences between the modules and the LPTs.

Starting period: January 2022 ? duration of 24 months

   * Work environment and Salary  *

Localization : Institut de Recherche en informatique de Toulouse (IRIT) ? 
Universite Toulouse - Jean Jaures / Maison de la Recherche, 5, allees 
Antonio Machado 31058 Toulouse.

Salary between 2200? and 2700? gross monthly depending on qualifications 
and situation.

* How to apply *

Applicants are required to have a PhD in Computer Science, a strong 
background in semantic web technologies, ontology matching and data 
linking. Fluency in written / spoken English is required too. A good 
publication record and strong programming skills will be a plus. 
Applications will be accepted until the position is closed.  Applicants 
should send a full CV including a complete list of publications, a cover 
letter indicating their research interests, achievements to date and 
vision for the future, as well as either support letters or the name of 2 
persons that have worked with them.

Contact: Cassia Trojahn (cassia.trojahn@irit.fr) and Olivier Teste (olivier.teste@irit.fr)
--
[LOGIC] mailing list
http://www.dvmlg.de/mailingliste.html
Archive: http://www.illc.uva.nl/LogicList/

provided by a collaboration of the DVMLG, the Maths Departments in Bonn and Hamburg, and the ILLC at the Universiteit van Amsterdam