FAQ - PMBio/scLVM GitHub Wiki

####How do I install scLVM? Please have a look at the installation section. In brief, you need python 2.7 with limix and some other dependencies (scipy, pylab, h5py).

####How do I get started? Good starting points are either the R vignette (https://github.com/PMBio/scLVM/blob/master/R/tutorials/scLVM_vignette.Rmd) or the ipython notebook tcells_demo.ipynb in the ./tutorials folder or the . You can view a html version of both or open the ipython notebook using ipython notebook and reproduce the results from our paper. The notebook requires an hdf5 data file containing the relevant data (normalised read counts etc.). We generate this data structure in R, and illustrate in transform_counts_demo.Rmd how this was done for the T-cell data.

####Who is behind scLVM? scLVM was mainly developed at EMBL-EBI, in the groups of Oliver Stegle and John Marioni and at the Institute of Computational Biology, Helmholtz Zentrum München. Software was written by Florian Buettner, Paolo Casale and Oliver Stegle. For more details have a look at our accompanying publication.

####Where can I learn more about the scLVM algorithm ? Check out our recent paper on scLVM:

Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, Teichmann SA, Marioni JC & Stegle O, 2015. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nature biotechnology., Nat Biotech, in press.

####Is there an R-package? Yes, we provide an R package. However, this is not a native R implementation of scLVM, but calls the relevant python code from within R. That is why, in order to run scLVM form within R, you still need to install python 2.7 with scipy, h5py, numpy and pylab.
After downloading the scLVM_0.99.1.tar.gz file type: R CMD INSTALL scLVM_0.99.1.tar.gz

####The python version of scLVM requires an hdf5 file. How do I generate it? Have a look at transform_counts_demo.Rmd. Here we illustrate how to we normalise raw read counts, test for significant variability of genes over technical noise and retrieve cell-cycle annotated genes. If you are not familiar with hdf5, consider using our R package with integrates all the preprocessing steps and doesn't require hdf5.

####What exactly is stored in the hdf5 file? scLVM requires the following input:

  • Normalised gene expression data -Technical noise (in log space)
  • Gene symbols
  • Heterogeneous genes (boolean vector)
  • Cell cycle genes (vector of indices) Check out our R demo scripts on how to generate these data form your raw read counts.

####scLVM uses spike-ins to estimate the technical noise. What if I don't have spike-ins? If you don't have spike ins, you can use a log-linear fit to estimate the relation between mean read count and squared coefficient of variation. This can then be interpreted as baseline variation and used in lieu of the technical noise estimates form spike-ins. For details on this procedure, have a look at the supplement of or paper or the R vignette.

####The correlation analysis is very slow. Is that normal? As scLVM evaluates the pairwise correlations between all genes, this can take some time. However, if you have access to a cluster (or several cores on the machine you work on), it is straight-forward to parallelise the analysis. Have a look at the run_analysisTcell.py script - here we illustrate how this can be done. We also provide a script to collect the results from the individual runs, run_collect.py.

####How do I cite scLVM? Please cite our scLVM as follows:

Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, Teichmann SA, Marioni JC & Stegle O, 2015. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nature biotechnology, in press.