TAL Journal: (64-2)
Robustness and limitations of natural language processing models

TAL Journal

TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, http://www.atala.org) since 1959 with the support of CNRS (National Centre for Scientific Research). It has moved to an electronic mode of publication, with printing on demand.

The journal is open-access: free to submit, free to publish and free to read. Published articles will be available on the ATALA website and on the ACL Anthology.

Manuscripts may be submitted in English or French.

Focus of the issue

Machine learning methods have made it possible to achieve spectacular results on numerous NLP tasks and benchmarks, giving the impression that many problems related to NLP are "solved" or soon to be solved. Nevertheless, it is still an open question as whether or not these models are really effective, or even usable, in non-controlled environments.

This thematic issue of the TAL journal aims at questioning the robustness and limits of modern NLP models, in particular related to the following aspects:

  1. "Non-standard" data: the use of models on non-standard data, that is data presenting variations with respect to language (diachronic language variation, diatopic variations, variation in word order, code-switching, user-generated content, inconsistent spelling, accidentally noisy data due to pre-processing, incomplete data, presence of specialized domain vocabulary...).
  2. Out-of-domain data: the use of models on data from a different domain that the one seen during training;
  3. Linguistic structures unseen during training: compositional generalization [1], structural generalization [2], or generalization related to gender bias [3], among others.


Relevant topics for this thematic issue include, but are not limited to, the following areas :

  • identification  and evaluation of linguistic phenomena that are problematic for neural networks and other NLP systems;
  • analysis and correction of error propagation in NLP systems (cascading error analysis);
  • feedback on the use of NLP systems that have been found to be non-functional on specific types of data;
  • critical analysis of datasets used for learning or evaluation;
  • construction of datasets to evaluate robustness with respect to linguistic variations;
  • data augmentation to improve robustness;
  • out-of-domain adaptation or learning with domains that are underrepresented in the data;
  • neural architectures or training methods that improve robustness.


All standard NLP tasks can be considered. Works on other languages than French and English are warmly welcomed.

Guest editors

  • Caio Corro (Université Paris-Saclay, CNRS, LISN)
  • Gaël Lejeune (Sorbonne Université, STIH)
  • Vlad Niculae (Language Technology Lab, University of Amsterdam)

Important dates

 Paper submission deadline : March 15th, 2024

Notification to the authors after first review  : May 2024

Notification to the authors after second review : July 2024

Publication : December 2024

References

[1] COGS: A Compositional Generalization Challenge Based on Semantic Interpretation (Najoung Kim, Tal Linzen), EMNLP 2020 https://aclanthology.org/2020.emnlp-main.731/
[2] Structural generalization is hard for sequence-to-sequence models (Yuekun Yao, Alexander Koller), EMNLP 2022 https://arxiv.org/abs/2210.13050
[3] Evaluating Gender Bias in Machine Translation (Gabriel Stanovsky, Noah A. Smith, Luke Zettlemoyer), ACL 2019 https://aclanthology.org/P19-1164/

   

Important dates

Paper submission deadline : March 15th, 2024

Notification to the authors after first review : May 2024

Notification to the authors after second review : July 2024

Publication : December 2024

Online user: 2 Privacy
Loading...