Welcome, this site is the right place any question you would have regarding how to make your research more reproducible. This is also the right place to post any information about new techniques, tools, conferences, workshops, training sessions, … related to reproducible research. Enjoy!
The prefered language on this site is English (French is also welcome) and the topics should be of interest to any research scientist, engineer, post-doc, PhD student, … This forum is hosted by Inria, the French national research institute for the digital sciences so reproducible research topics are likely to be mostly related to digital sciences but do not restrict yourself. Just ask!
You take notes and you want to be able to find them back. You make calculations on your computer, but your results change from day to day. You analyse data, or you work on a new method that you would like to share easily with your colleagues so that they can use it as well. You conduct experiments and want to discuss your protocol with others to improve it.This is the right place to share best practices and discuss how to improve them.
This forum is intended to:
- Bootstrap beginners to reproducible research by pointing them to the right resources (tutorials, research articles, tools, …)
- Announce seminars, training sessions, workshops, hackatons, … that can help improving your skills
- Have more experienced researchers discuss on technical points, publication practices, etc.
To get a first introduction, we suggest you to have a look at our general introduction category.
My name is Arnaud Legrand and I’m the leader of the POLARIS Inria team. I’m a CNRS research scientist at the LIG, #grenoble. My research targets the management (mostly from an algorithmic point of view, i.e., scheduling, load balancing, fairness, game theory….) and performance evaluation (in particular through simulation, visualization, statistical analysis, …) of large scale distributed computing infrastructures such as clusters, grids, desktop grids, volunteer computing platforms, clouds,… when used for scientific computing.
I’m one of the main developers (erm…) of the SimGrid project, a simulation toolkit for building simulators of distributed applications (originally designed for scheduling algorithm evaluation purposes) developed in collaboration with Henri Casanova, Martin Quinson and Frédéric Suter. Over the past few years, I have been particularly involved in reproducible research related topics, with a MOOC, a book, webinars, …
I have some skills on journaling with #orgmode #jupyter #rstudio, #experiments, #statistics, …
Hi Arnaud, all,
I’m Benjamin Guedj, I’m an Inria researcher (Modal team, Lille - Nord Europe) now relocated to London to lead and shape the Inria@London initiative. I am a mathematician and my main research interests are in machine learning, deep learning, statistical learning theory and practical implementation of learning algorithms. I am also interested in computational statistics and probability theory. I have contributed a few R packages (not actively maintained) and collaborated to several python libraries.
Looking forward to learning all perks on reproducibility on this forum!
I’m Alan Schmitt, a researcher in the Celtique team at Inria Rennes. I mostly work on the semantics of programming languages: how to formally define them and use them to prove properties about programs. I spend most of my day using #emacs and #orgmode, to which I’ve contributed a little. I’m very interested in teaching students how to use these tools to help them during their thesis, so that when the time comes to write a paper or a dissertation, they have all the needed material readily available.
I am Nathalie Revol, I have a PhD in applied mathematics, I have been associate professor in applied math (numerical analysis) at University of Lille during 5 years and since 2002 I am a research scientist at Inria. I work in Lyon, at the computer science lab of “ENS de Lyon”.
My research subject is computer arithmetic and more specifically interval arithmetic. Interval arithmetic consists in computing with intervals instead of numbers. Interval computations account for uncertainties on the data, as a single value is replaced by an interval containing the data, and for roundoff errors. This domain of research is related to numerical quality. In other words, the question is: how far is the computed result of a numerical computation from the “exact” result (as if computed with real numbers, without errors)?
My interest in reproducibility concentrates on the lack of reproducibility of numerical computations using floating-point arithmetic. Floating-point arithmetic is also called “scientific notation”, it corresponds to the “float” and “double” types in C, to “binary32” and “binary64” in the IEEE 754 standard. Indeed, even simple arithmetic operations such as additions or multiplications are not associative. As an example, the result of a sum of many terms computed on several processors by a parallel computation depends on the order in which the partial sums are executed, that is, it may vary from one execution of the same code to the next. See https://hal.archives-ouvertes.fr/hal-00916931 for more details on the hows and whys. So far, the solution is to enforce reproducible results, but it usually entails the use of slow libraries, and only for a limited set of numerical routines.
I am also interested in determining whether, and when, numerical reproducibility is needed. Indeed, some people advocate that the lack of numerical reproducibility is a hint of numerical instabilities and is thus a precious “feature”.
I am Pascal PERNOT, DR@CNRS, from the Laboratoire de Chimie Physique in Orsay.
My main research subject is the characterization of uncertainty in computational chemistry (CC),
which is a central issue for the assessment of the replicability of CC predictions. I am also very interest in the problems of numerical reproducibility, and the impact it might have on these predictions.
I am a coauthors, with Arnaud et al., of the book-sprinted “Vers une recherche reproductible : Faire évoluer ses pratiques”.
I’m Christopher Stevens, currently working on a PhD in neuroscience at Neurocentre Magendie (Université de Bordeaux/INSERM) and another PhD in philosophy at SPH (Université Bordeaux-Montaigne).
In neuroscience, my current research subject is confirmation bias. I study this using a mouse model I conceived, a version of which I now plan to test in humans too. However, at heart, I’m more of a philosopher of science than a scientist, and my research in that domain concerns scientific education, specifically from a “pragmatist” point of view.
I’m here looking for fellow researchers who are passionate about the cause of truly open science and have some projects about this which I will share with you all “incessamment sous peu”!
I am Konrad Hinsen, researcher at the Centre de Biophysique Moléculaire (CNRS Orléans) and at the Synchrotron SOLEIL (Saint Aubin). Starting from a PhD in statistical physics, working on colloids, I moved on to studying biological macromolecules (mainly proteins, a bit DNA) using computational methods. I have been unhappy with the current state of research on biomolecules for many years: most researchers run badly documented software written by someone else, don’t really know the methods they apply (and in particular don’t know their limits), and are unable to compute what they really need to compute because they cannot modify their software. Reproducibility is almost non-existent in my field, as sharing code and data remains the exception and method descriptions in papers are completely insufficient to understand what was done.
My first attempt to improve this situation was to implement computational methods in Python rather than C or Fortran, aiming at more compact and readable code. My Molecular Modelling Toolkit thus became the first domain-specific scientific library of the Scientific Python ecosystem, which I helped create a few years earlier as a founding member of the Numerical Python project (now NumPy). Unfortunately, this ecosystem has recently turned into a reproducibility nightmare, with breaking changes in central packages being introduced regularly.
My main research project aiming specifically at reproducibility is ActivePapers, an infrastructure for managing code and data dependencies under the specific conditions of high-performance computing. I have also started thinking about the next step after reproducible research, which I call verifiable research. The challenge is to ensure that researchers not only know which computations were done in some published work, but are also able to figure out with reasonable effort what the software does exactly.
In the reproducible research community, my main activities have been
- the organization of the two national (French) workshops on the subject, in 2012 and 2015,
- the MOOC that Arnaud has already mentioned,
- the online journal ReScience, dedicated to replications of computational studies, which I co-founded with Nicolas Rougier in 2015.
I’m Matthieu Simonin from Inria/Rennes. I’m engineer in the SED (Experimentation and Development Service). Among other things I’m involved in several projects that requires experimental validation of distributed systems. Validating something on such system requires to deploy lots of processes across several machines (say 1 to hundreds) and managing their life-cycles (configuration, start, stop…). I lately focused my attention on getting such deployment reusable at some points (there are common needs to different validation campaigns) and consequently ease the experimenter life. One attempt in this direction is EnOSlib.
My experimental workflow uses #python (a lot), #orgmode (a bit) and #nix (more in the future, probably).
I’m Sabrina Granger and I’m librarian at Bordeaux University in a regional training unit.
I don’t do any fancy thing in Python nor R, I don’t struggle with dependencies, but I had to dive into the topic of reproducible research because my job is to organize training sessions for researchers from a wide range of disciplines.
I strive to address this never ending question : “What would a researcher need to learn?”
Therefore, I jumped into the (big) pool of statistics, study design, coding, machine learning and deep learning, containers, and so much more. And I’m also interested in how to teach open science and reproducible research. I organize training sessions, but also meet up, workshops or less usual events such as a book sprint (authors : Loïc Desquilbet, Boris Hejblum, Arnaud Legrand, Pascal Pernot, Nicolas Rougier).
I’m here to learn, to share information and to identify experts who would like to collaborate around these topics.
Aloha, I’m hocquet (Alexandre Hocquet IRL).
I am a former computational chemist academic and now a professeur des universités in history of science at the Université de Lorraine and a member of the laboratory Archives Henri-Poincaré—Philosophie et Recherches sur les Sciences et les Technologies. My focus is on STS, particularly the relationships between software and production of knowledge with works on computational chemistry but also Wikipedia and Football Manager. Methodologically, my work has been relying on the analysis of threaded conversations in web forums or mailing lists.
I’m sharing with Frédéric Wieber of the same lab a research project regarding scientific software and our case study is computational chemistry from the 70s to the 00s.
Hans IJzerman here. Received my PhD in the Netherlands (Utrecht University) and work at LIP/PC2S in Grenoble for two years now. My general research focus is on social thermoregulation, but I have gotten interested the last few years in meta-science. I have been involved in the various ManyLabs projects, have written on replication and how to make research more reproducible, and have been an associate director and founding member of the Psychological Science Accelerator.
Our website can be found here: www.corelab.io.
Looking forward to discussing things here.
Tru Huynh here. PhD in protein folding and molecular modeling in a previous life. I have switched to IT in Michael Nilges research group at the Institut Pasteur, managing our group IT resources. I have been also been a centos,org team member. I am interested in the software/hardware solutions for reproduceable science.
I’m Marc Tommasi, prof. at the University of Lille and member of the Magnet Team. I’m interested in (statistical) machine learning, graphs, natural language processing, privacy, formal languages. I’m here to learn good practices in reproducible research. I’m an emacs/orgmode user.
This is Mohammad Akhlaghi, a postdoctoral researcher at the Instituto de Astrofísica de Canarias (IAC), in Tenerife, Spain. Before this, I was a CNRS postdoc at the Centre de Recherche Astrophysique de Lyon (CRAL, France), and I got my PhD in the Tohoku University Astronomical institute (Sendai, Japan). I am also the founder and maintainer of GNU Astronomy Utilities.
As a natural scientist, reproducibility has been very important for me from the start of my career (as a PhD student): nature is already a black box we are trying very hard to interpret, adding artificial black boxes (non-reproducible/understandable research) in the way only hampers our progress.
To do my research in a reproducible manner, I have designed a reproducible paper/project template to facilitate starting new projects in a reproducible way, please see these slides for an introduction.
This template was recently awarded an “RDA Adoption Grant” (funded by EU’s Horizon 2020). The template is publicly released, and we have also applied for it to become a GNU package (it is currently under review). It is managed in Git (see the slides for links). Trial of the template is very easy, as described in the
README.md, it doesn’t need root access and will do everything only in the “build” directory you specify (which can easily be removed later). In our team, we are now extensively using this template to manage our research projects, for an example of a recent paper using this template, see zenodo.3408481.
But this is just one among many other implementations, which I have already learnt a lot from. It has its own particular problems that I already know of, and I am sure it has many others that I don’t yet know about. So please don’t hesitate to try it out and share your thoughts/criticisms of it on this forum, I would highly appreciate it.
I look forward to nice discussions on this forum to share my experience with all the others and more importantly learn from the discussions and apply them as best as I can to my own research (and hopefully make it easier for more natural scientists to do reproducible research).
My name is Olivier Pantalé, I am a Full Professor of section CNU-60 (Mechanics). I am a research professor at the Ecole Nationale d’Ingénieurs de Tarbes (ENIT), and I carry out my research work in the Laboratoire Génie de Production of ENIT, in the Mechanics-Materials-Systems (MMS) department. I am co-leader of the research group Metallurgy, Mechanics, Structures and Damage (M2SD).
My teaching activities, in connection with my research activities, are mainly related to Nonlinear Finite Element Numerical Modeling (which I teach at ENIT and ISAE-SUPAERO in Toulouse), and structural analysis. I am also in charge of teaching Scientific Computing for Engineers, in which students are asked to do some work in Python on Jupyter notebook.
The research axes that I have developed in the LGP for several years are mainly centered around the field of numerical mechanics:
- The numerical simulation of structures subjected to severe dynamic stresses (shocks and impacts) and the numerical simulation of machining and forming processes.
- The identification of dynamic non-linear behavior laws. These activities related to identification cover the fields of experimental tests in fast dynamics, the formulation of nonlinear dynamic behavior laws, the development of inverse identification procedures, the correlation of tests/numerical simulations.
- The development of numerical simulation tools and calculation codes. These activities mainly concern the development in C++ of the explicit dynamics FEM code DynELA, the development of numerical tools for processing experimental and numerical data in Python, and the development of computational algorithms.
My name is Mathieu Acher. I am Associate Professor at DiverSE (IRISA/Inria research team), University of Rennes 1 and Institut Universitaire de France (IUF).
I am mainly working on software variability: how to design, implement, test, measure, and understand billions^thousands of software variants that can be configured through compile-time options, command-line parameters, feature flags or toggles, configuration files, etc.
For instance, we have addressed the challenge of predicting properties of Linux kernels over variants (15K+ options!) and versions: Transfer Learning Across Variants and Versions: The Case of Linux Kernel Size - Inria
Together with other colleagues, we are now studying deep software variability: many layers (hardware, operating system, third-party libraries, versions, workloads, build system, etc.) themselves subject to variability can alter the results of a software system (eg a scientific workflow).
Back to reproducible research, it means that using the same data analysis, different software can lead to different results because of (deep) software variability. I aim to understand the significance of the problem in computational science (in many domains, beyond pure software engineering) and what software solutions we can bring to the table for effectively exploring the variant space of scientific experiments. I have written some words about the link between deep software variability and reproducible science with some evidence of the phenomenon Reproducible Science and Deep Software Variability
I am thus interested to collaborate with (computational) scientists dealing with software variability issues, especially for actually reproducing and replicating real-world experiments.
I have also the ambition to propose to Bachelor/Master students a set of papers that can truly be reproduced (and replicated). If you have any, please share!
I love software and have some skills with #python #jupyter #docker #experiments #statistics #linux #machinelearning… and experience with reproducble/replicable science