Séminaire "Reproducibilité de la recherche en traitement du signal" le 22 Septembre et le 20 Octobre 2021

Annonce

Séminaire "Reproducibilité de la recherche en traitement du signal"

  • 22 Septembre 2021 9:00 - 12:30, on zoom
  • 20 Octobre 2021 9:00 - 12:30 on zoom

Orateurs

Appel à contribution

La première demi-journée comprendra une série de présentations de doctorant-e-s souhaitant partager leur expérience en termes de mise en pratique de la recherche reproductible. Cette présentation se fera en deux étapes : une première en plénière pendant cinq minutes, si possible en anglais ; puis une seconde en petits groupes pour des échanges plus libres. Si cela vous intéresse, prière d’envoyer un courriel à mathieulagrangels2nfr avec un bref descriptif de votre proposition.

Organisation

Demi-journée 1 : 22 Septembre 2021

9:00 Cynthia Liem
10:00 Alexandre Gramfort
11:00 Présentations des contributions des doctorants en plénière
11:30 Discussion en salon virtuel autour des contributions des doctorants
12:30 Clôture

Demi-journée 2 : 20 Octobre 2021

9:00 Brian McFee
10:00 Annamaria Mesaros
11:00 Tour de table
12:30 Clôture

Organisateurs

Mathieu Lagrange, chargé de recherche CNRS, LS2N, UMR 6004
Vincent Lostanlen, chargé de recherche CNRS, LS2N, UMR 6004
Slim Essid, Professeur, LTCI, Télécom ParisTech

Context

The process of scientific experimentation is increasingly based on information science. In particular, signal and image processing (SIP) tools have played an essential role in many recent discoveries in physics: the detection of gravitational waves and the observation of black holes, for example. In addition, recent advances in certain digital technologies, such as functional neuroimaging and sound classification, are based on increasingly sophisticated software codebases. However, each of these SIP applications is not the result of a single algorithm, but of the joint work of a specialized research sub-community. Whether in astrophysics or acoustics, the innovation process remains essentially the same: first, the community develops massive databases, performance metrics, and a common software environment. Then, individual research groups compete to improve the state of the art. For example, the renewed growth of deep neural networks during the decade 2010?2020 was made possible thanks to new databases (eg, ImageNet, AudioSet), official ``challenges’’ (eg, ILSVRC, DCASE), and numerical libraries (eg, TensorFlow, PyTorch).

In this context, the reproducibility of the experiments bears a crucial importance. First, when addressing a new problem, it is useful to begin with a simple-minded approach whose theoretical properties are well understood. Such a baseline should be made freely accessible. Secondly, students gain a hands-on experience by inspecting and re-implementing well-established methods in signal processing; and, more generally, in information science. Finally, developing software in open-source communities rather than in vertical organizations (silos) has advantages per se: quicker bug reporting and troubleshooting, up-to-date documentation, and schedule feature requests.

However, the need for research reproducibility goes beyond a simple list of good practices such as version control or the use of unit tests. In her work on “trustworthy information systems” (TIS), Cynthia Liem has shown that state-of-the-art deep neural networks for music classification are far from having a “musical ear”: rather, these models tend to exaggerate some imperceptible aspects of music while lacking sensitivity to musically meaningful transformations.

The high cost of data acquisition, in the field of neuroscience for example, jeopardizes the reproducibility of numerical experiments. Therefore, in order to boost the adoption of open data, it is necessary to integrate software routines for loading and formatting data alongside transformation and statistical learning tools. This is what Alexandre Gramfort proposed with the scikit-learn libraries as well as with the Rapid Analytics and Model Prototyping (RAMP) project.

In addition, signal processing research often operates on highly structured data: such is the case, for example, of a musical score or a chord progression. To guarantee the reproducibility of music information retrieval systems, this rich structure should be preserved in machine predictions and remain interpretable by humans.The work of Brian McFee on the JSON-Annotated Music Specification (JAMS) format reflects this concern for structuring and software interoperability.

Lastly, the definition of relevant evaluation metrics requires special attention. Indeed, it is on the basis of these metrics that the scientific community concerned decides on its future directions and assesses the relevance of its proposals. Annamaria Mesaros, who in particular has been organizing theDetection and Classification of Acoustic Scenes and Events (DCASE) challenge since 2016, has a long experience of these questions of evaluation and editorialization of applied research. In particular, she maintains the sed_eval software library which is now the de facto standard for evaluating the performance of a sound event detector.