"Human dermal fibroblast clonality project"

McCarthy, Davis J., Raghd Rostom, Yuanhua Huang, Daniel J. Kunz, Petr Danecek, Marc Jan Bonder, Tzachi Hagai, et al. 2018. ‘Cardelino: Integrating Whole Exomes and Single-Cell Transcriptomes to Reveal Phenotypic Impact of Somatic Variants’. Preprint. Genomics. https://doi.org/10.1101/413047.

https://davismcc.github.io/fibroblast-clonality/index.html

Data pre-processing

The data pre-processing for this project from the raw data described above is complicated and computationally expensive, so this repository does not reproduce the data pre-processing in an automated way. However, we provide the source code for the Snakemake workflow for data pre-processing in this repository. Docker images providing the computing environment and software used are publicly available, split into an image for command line bioinformatics tools and an R installation with necessary packages installed.

If you would like to pre-process the data from raw reads to results as we have, please consult our description of how to run the workflow.

Analyses

Here we present the reproducible the results of our analyses. They were generated by rendering the R Markdown documents into webpages available at the links below.

The results presented in the paper were produced with these analyses.

  1. Simulation results.
  2. Overview of lines.
  3. Selection models.
  4. Analysis of clonal prevalences.
  5. Analysis for the example cell line joxm .
  6. Variance components analysis.
  7. Differential expression analysis.
  8. Analysis of effects of somatic variants on cis gene expression.

Data availability

This is a complicated project, and reproducing all of the results presented, especially from raw data is highly non-trivial. Nevertheless, we have made all data available so that everything is entirely reproducible.

Single-cell RNA-seq data have been deposited in the ArrayExpress database at EMBL-EBI under accession number E-MTAB-7167. Whole-exome sequencing data is available through the HipSci portal. Processed data and large results files are available from Zenodo with DOI 10.5281/zenodo.1403510.

To set up the project to reproduce our analyses, first clone the source code repository from GitHub. Next, download all of the reference, metadata and results files and add them to the (cloned) project folder with the following structure: […]

At this level of technological complexity, I wonder if anyone ever tries to reproduce the results!