Cellbender - MattHuff/SingleCellDocumentation_112023 GitHub Wiki

Cellbender is used in our pipeline to remove ambient RNA from the raw output of Cell Ranger. In doing so, it can also remove empty droplets - those without coverage from any Fastq dataset, which allows it to supersede DropletQC.

1. Install Cellbender

The documentation listed at the top of this page includes multiple options for installing Cellbender. Their recommendation is to create a conda environment and install with pip install cellbender within the environment, which would look something like this:

(base) $ conda create -n cellbender python=3.7
(base) $ conda activate cellbender
(cellbender) $ pip install cellbender

Alternatively, a Docker image is available, and can be installed outside of the cluster.

docker pull us.gcr.io/broad-dsde-methods/cellbender:latest

On the Palmetto Cluster, there is singularity image available. We will use that instead of conda for this run.

2. Get output files

You need the raw matrix count file in h5 format from Cell Ranger. Make sure it is the raw counts - the "filtered" counts have empty droplets removed, but it's preferred to include all tables for proper removal of ambient RNA. Using the example data set described in my Cell Ranger page, the file you want is SC3pv3_GEX_Human_PBMC_raw_feature_bc_matrix.h5.

However, you may find that your output files don't have sample names. In this case, I recommend creating symbolic links to the h5 matrices in the directory where you plan to run Cellbender, and include these sample names:

ln -s ../2_cellranger/run_norris_p0mice_2/outs/raw_feature_bc_matrix.h5 2-XD207-11_raw_feature_bc_matrix.h5
ln -s ../2_cellranger/run_norris_p0mice_3/outs/raw_feature_bc_matrix.h5 3-XD205-2_raw_feature_bc_matrix.h5
ln -s ../2_cellranger/run_norris_p0mice_4/outs/raw_feature_bc_matrix.h5 4-XD207-9_raw_feature_bc_matrix.h5

3. Run Cellbender

#!/bin/bash

#PBS -N 3_cellbender
#PBS -l select=1:ncpus=16:ngpus=2:mem=200gb
#PBS -l walltime=08:00:00
#PBS -q musc3_gpu
#PBS -j oe

#source ~/.bashrc
#mamba activate cellbender
#module load cuda/12.1.1-gcc/9.5.0
cd $PBS_O_WORKDIR

for h5 in /zfs/musc3/huffmat/scRNAseq/analysis/3_cellbender/*.h5
do
	BASE=$( basename $h5 | sed 's/_raw_feature_bc_matrix.h5//g' )

	singularity exec --nv -B /zfs/musc3:/zfs/musc3 --pwd /zfs/musc3/huffmat/scRNAseq/analysis/3_cellbender /zfs/musc3/singularity_images/biocm-cellbender_1.0.2.sif cellbender remove-background \
		--cuda \
		--input $h5 \
		--output ${BASE}_cellbinder_feature_bc_matrix.h5 \
		--fpr 0.01 \
		--epochs 150
done

The official documentation includes an option called --cuda, which may give you issues if you try to run it. This option works with GPU machines; if you are not working with one, you do should not include it. The code above ensures that we are using GPUS with a few extra options:

#PBS -l select=1:ncpus=16:ngpus=2:mem=200gb - Specifically requests GPUs in addition to CPUs.
singularity exec --nv - required to enable GPUs in the container.

When a cellbender run is finished, the h5 file you want is the "filtered" file - this file excludes empty droplets, so it's best to move forward using it.

Update 05/06/24 - PyTable

The latest version of Cellbender produces an h5 file that is incompatible with Seurat's Read10X_h5() command. This is likely to be corrected in a later version of Seurat, but for now, Cellbender's official documentation recommends running the ptrepack command of PyTables to make the table Seurat-compatible.

for h in *.h5
do
	BASE=$(basename $h | sed 's/.h5//g')
	echo $h
	echo $BASE

	ptrepack --complevel 5 ${h}:/matrix ${BASE}_seurat.h5:/matrix
done

This step was not necessary for the initial set of files, but it was run on the next set of files.