‘Computing Environments for Reproducibility: Capturing the “Whole Tale”’

Brinckman, Adam, Kyle Chard, Niall Gaffney, Mihael Hategan, Matthew B. Jones, Kacper Kowalik, Sivakumar Kulasekaran, et al. 2019. ‘Computing Environments for Reproducibility: Capturing the “Whole Tale”’. Future Generation Computer Systems 94 (May): 854–67. https://doi.org/10.1016/j.future.2017.12.029.

Computational tale = sharing the source code, data and methods along with the computational environment in which inquiry is conducted

« The Whole Tale project is intended to support the lifecycle of data. This means that all parts of the lifecycle, from data ingest or creation through to publication of the resulting scholarly objects such as data, code, workflows, and manuscripts, should be managed within the Whole Tale environment. » (Brinckman et al., 2019, p. 857)

« The final, and perhaps more important, aim of the Whole Tale is to define a model for reproducibility by capturing the data, methods, metadata, and provenance of a particular research activity within the system . […] Having creating a tale, researchers should be able to simply share them with others, publish them to connected repositories, associate a persistent identifier, and link them to publications. […] Tales also contain Intellectual Property metadata […].» (Brinckman et al., 2019, p. 857)

« The Whole Tale architecture consists of a set of microservices (e.g. for data access, persistent identifier creation, etc.) and interoperability softwares that leverages, where possible, existing cyberinfrastructures. » (Brinckman et al., 2019, p. 865)
« […] by sharing a paper as a tale, the narrative is shared together with an on-demand, virtual computer that is preloaded with all the relevant data, methods, software packages, and analysis fronteneds needed to reproducice, tinker with, or even extend the paper . » (Brinckman et al., 2019, p. 855)

« […] Whole Tale will provide access to a collaborative environment, where scripts and analysis methods can not only be transplanted seamlessly between datasets, but where they can be collaborated on between individuals […]. » (Brinckman et al., 2019, p. 856)


The design principle of this platform seems to be “let’s solve a complexity problem by adding a few more layers of complexity”. I suspect that the biggest risk to the long-term reproducibility of research done on the Whole Tale platform is the possible lack maintenance of the platform itself.

A similar idea is expressed in this tweet by Gaël Varoquaux on Apache Arrow.

Thanks for the information ! best