Cell Ranger - MattHuff/scRNASeq_011224 GitHub Wiki
Cell Ranger aligns scRNASeq data to a reference genome and counts the number of reads aligned to genes of this reference. This is step is the greatest divergence between this documentation and my previous documentation. Whereas the reads in the earlier documentation contained 10X white-listed barcodes, these reads do not. As a result, the documentation will use a different command to achieve the same results.
mkdir 3_cellranger
cd 3_cellranger
You will need two files from Cell Ranger - the Human reference genome and a set of transcriptome probes. These can be downloaded directly to the server with the wget command, as follows:
wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
wget https://cf.10xgenomics.com/supp/cell-exp/probeset/Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv
However, this may take too long for even an interactive session, so you may have better success directly downloading these files to your computer (just copy and paste the link, the files will begin downloading immediately) and transferring them to the server (described in the previous documentation.)
rsync -avzh --remove-source-files -e ssh Downloads/refdata-gex-GRCh38-2020-A.tar.gz <user_name>@hpcdtn01.rcd.clemson.edu:/zfs/musc3/huffmat/Jordan_scRNASeq_011024/analysis/3_cellranger/
Cell Ranger's multi
function requires a config CSV file, which can be generated with the following command:
singularity exec -B /zfs/musc3:/zfs/musc3 --pwd /zfs/musc3/huffmat/Jordan_scRNASeq_011024/analysis/3_cellranger /zfs/musc3/singularity_images/biocm-cellranger_latest.sif cellranger multi-template
This default CSV file will have several options we don't need. The final config files I used looked like this (using the last sample as my example):
# This template shows the possible cellranger multi config CSV options for analyzing Single Cell Gene Expression with Feature Barcode Technology (Antibody Capture, CRISPR Guide Capture, Cell Multiplexing, Antigen Capture), Fixed RNA Profiling, or Single Cell Immune Profiling data.
# These options cannot be used all together - see section descriptions for detail.
# Use 'cellranger multi-template --parameters' to see descriptions of all parameters.
# Please see cellranger multi documentation for details and experimental design-specific examples at https://www.10xgenomics.com/support.
[gene-expression]
reference,/zfs/musc3/huffmat/Jordan_scRNASeq_011024/analysis/3_cellranger/refdata-gex-GRCh38-2020-A
probe-set,/zfs/musc3/huffmat/Jordan_scRNASeq_011024/analysis/3_cellranger/Chromium_Human_Transcriptome_Probe_Set_v1.0.1_GRCh38-2020-A.csv, # Required, Fixed RNA Profiling only.
# filter-probes,<true|false>, # Optional, Fixed RNA Profiling only.
# r1-length,<int>
# r2-length,<int>
# chemistry,<auto>
# expect-cells,<int>
# force-cells,<int>
# no-secondary,<true|false>
# no-bam,<true|false>
# check-library-compatibility,<true|false>
# target-panel,/path/to/target/panel, # Required, Targeted GEX only.
# no-target-umi-filter,<true|false>, # Optional, Targeted GEX only.
# include-introns,<true|false>
# min-assignment-confidence,<0.9>, # Optional, Cell Multiplexing only.
# cmo-set,/path/to/CMO/reference, # Optional, Cell Multiplexing only.
# barcode-sample-assignment,/path/to/barcode-sample-assignment/csv, # Optional, Cell Multiplexing only.
[libraries]
fastq_id,fastqs,feature_types
LVF006-Septum,/zfs/musc3/huffmat/Jordan_scRNASeq_011024/analysis/3_cellranger,Gene Expression
# Antibody1,/path/to/fastqs,Antibody Capture
# CRISPR1,path/to/CRISPR_fastqs,CRISPR Guide Capture
# CMO1,/path/to/fastqs,Multiplexing Capture, # Cell Multiplexing only
# VDJ_B1,path/to/vdj_B_fastqs,VDJ-B, # 5' Immune Profiling only
# VDJ_T1,path/to/vdj_T_fastqs,VDJ-T, # 5' Immune Profiling only
# VDJ_T_GD1,path/to/vdj_T_GD_fastqs,VDJ-T-GD, # 5' Immune Profiling only for gamma-delta TCR
# Antigen1,path/to/antigen_capture_fastqs,Antigen Capture #5' Antigen Capture only
This is another job in which I ran individual qsub submissions for each sample. Sticking with the final sample:
#!/bin/bash
#PBS -N 3_cellranger_sample4
#PBS -l select=1:ncpus=16:mem=250gb
#PBS -l walltime=04:00:00
#PBS -j oe
cd $PBS_O_WORKDIR
singularity exec -B /zfs/musc3:/zfs/musc3 --pwd /zfs/musc3/huffmat/Jordan_scRNASeq_011024/analysis/3_cellranger /zfs/musc3/singularity_images/biocm-cellranger_latest.sif cellranger multi \
--id run_Jordan_4 \
--csv sample4.csv
All output files are stored in a directory with the same name as the --id
option. In this case, I have for "run_Jordan" directories - 1 through 4. You may also name them with the actual sample IDs, just as long as you know which sample is which.