Case Study - ZhengmzLab/ScSmOP GitHub Wiki

We prepared testing datasets from different types of single-cell single-molecule omics.

Case Omics Experiment Dataset Source
1. 10× Genomics (V3) Single Cell Gene Expression scRNA-seq 10× Genomics (v3) Single Cell Gene Expression PBMC_5k 10X Genomics
2. ChIA-Drop Single-molecule 3D Genomics ChIA-Drop ChIA-Drop GSM3347523
3. SPRITE Single-molecule 3D Genomics SPRITE SPRITE GSM3154194
4. RNA-DNA SPRITE Sinle-molecule 3D Genomics RNA-DNA SPRITE rdSPRITE GSM4579992
5. scSPRITE Single-cell single-molecule 3D Genomics scSPRITE mESC scSPRITE GSM4669508
6. 10× Genomics (V1) Single Cell ATAC scATAC-seq 10× Genomics (v1) Single Cell ATAC Human PBMC 10X Genomics
7. Chromium Single Cell Multiome ATAC + Gene Expression Single-cell multisome Chromium Single Cell Multiome ATAC + Gene Expression Human PBMC 10X Genomics
8. Visum Spatial Gene Expression Spatial Transcriptomics Visum Spatial Gene Expression mouse olfactory bulb 10X Genomics
9. 10× Genomics (V3) Single Cell Gene Expression (parallel with UniverSC) scRNA-seq 10× Genomics (v3) Single Cell Gene Expression Tiny FASTQ of human chr21 Drop-seq GitHub
10. Drop-seq (parallel with UniverSC) scRNA-seq Drop-seq Tiny FASTQ of human chr21 Drop-seq GitHub
11. DIY scRNA-seq 10× Genomics (v3) Single Cell Gene Expression using Chromium Single Cell Multiome ATAC + Gene Expression kit capture 3' end RNA Human GM12878 and Drosophila S2 cell Home brewed

1. 10× Genomics (V3) Single Cell Gene Expression

Project description

This library is prepared from peripheral blood mononuclear cells (PBMCs) from a human. 5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (v3 chemistry) https://www.10xgenomics.com/resources/datasets/5-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-1-standard-3-0-2.

The reference genome we selected is hg38, and library name set to PBMC_5k.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose STAR index has been generated @

    ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir scRNA_PBMC5K
    (ScSmOP) :~$ cd scRNA_PBMC5K
    (ScSmOP) :~/scRNA_PBMC5K$

Download scRNA-seq FASTQ files from 10x Genomics Datasets

    (ScSmOP) :~/scRNA_PBMC5K$ wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/3.0.2/5k_pbmc_v3/5k_pbmc_v3_fastqs.tar
    (ScSmOP) :~/scRNA_PBMC5K$ tar -xvf 5k_pbmc_v3_fastqs.tar
    (ScSmOP) :~/scRNA_PBMC5K$ cd 5k_pbmc_v3_fastqs
    (ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ 

There were 3 types of reads: Read 1, containing 16 bp cell barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome, Sample index I1 read which is not used.

    (ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ ls
    5k_pbmc_v3_S1_L001_I1_001.fastq.gz  5k_pbmc_v3_S1_L003_I1_001.fastq.gz
    5k_pbmc_v3_S1_L001_R1_001.fastq.gz  5k_pbmc_v3_S1_L003_R1_001.fastq.gz
    5k_pbmc_v3_S1_L001_R2_001.fastq.gz  5k_pbmc_v3_S1_L003_R2_001.fastq.gz
    5k_pbmc_v3_S1_L002_I1_001.fastq.gz  5k_pbmc_v3_S1_L004_I1_001.fastq.gz
    5k_pbmc_v3_S1_L002_R1_001.fastq.gz  5k_pbmc_v3_S1_L004_R1_001.fastq.gz
    5k_pbmc_v3_S1_L002_R2_001.fastq.gz  5k_pbmc_v3_S1_L004_R2_001.fastq.gz

Run pipeline scsmop.sh

    (ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n PBMC_5k -1 5k_pbmc_v3_S1_L001_R1_001.fastq.gz,5k_pbmc_v3_S1_L002_R1_001.fastq.gz,5k_pbmc_v3_S1_L003_R1_001.fastq.gz,5k_pbmc_v3_S1_L004_R1_001.fastq.gz -2 5k_pbmc_v3_S1_L001_R2_001.fastq.gz,5k_pbmc_v3_S1_L002_R2_001.fastq.gz,5k_pbmc_v3_S1_L003_R2_001.fastq.gz,5k_pbmc_v3_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -@ 10

Check finish

    (ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ cd 04.QualityAssess
    (ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
    Total_read_pairs 383941607
    Read_pairs_with_full_barcodes 375636262
    Fully_barcode_rate .978368
    Cell_count_at_fastq  1334303
    Cell_count_estimated 4544
    Total_gene_detected 23801
    Mean_gene_per_cell 2538
    Median_UMI_per_cell 8412

2. ChIA-Drop

Project description

This library is prepared from S2 cells wich ChIA-Drop. GSM3347523 SRR7722051.

The reference genome selected is dm3, and library name set to CHDP.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose bwa index has been generated @

    ~/RefGenome/bwa_dm3_index/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir CHDP
    (ScSmOP) :~$ cd CHDP
    (ScSmOP) :~/CHDP$

Download ChIA-Drop FASTQ files from NCBI *Prepare ChIA-Drop data need sratools installed first, refer to sra-tools installation.

    (ScSmOP) :~/CHDP$ prefetch SRR7722051
    (ScSmOP) :~/CHDP$ ln -s SRR7722051/SRR7722051.sra .
    (ScSmOP) :~/CHDP$ fastq-dump --split-files --gzip SRR7722051.sra

There were 2 types of reads: Read 1, containing 16 bp complex barcode (chromium barcode) from its first base pair, followed by a 8 bp spacer and 127bp genomic fragment; Read 2, containing 151 bp genomic fragment. Genomic fragment which need to be aligned to reference genome.

    (ScSmOP) :~/CHDP$ ls
    SRR7722051_1.fastq.gz
    SRR7722051_2.fastq.gz
    SRR7722051.sra

Run pipeline scsmop.sh

    (ScSmOP) :~/CHDP$ ~/ScSmOP/scsmop.sh -t chiadrop -n CHDP -1 SRR7722051_1.fastq.gz -2 SRR7722051_2.fastq.gz -b ~/RefGenome/bwa_dm3_index/dm3.fa -s ~/ScSmOP/ChromSize/dm3.size.txt -@ 10

Check finish

    (ScSmOP) :~/CHDP$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    03.GroupAndRefine
    GroupAndRefinement.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/CHDP$ cd 04.QualityAssess
    (ScSmOP) :~/CHDP/04.QualityAssess$ cat CHDP_final_stat.tsv
    Total_read_pairs  71,667,907 
    Read_pairs_with_full_barcodes  66,940,410 
    Fully_barcode_rate 93.4%
    Complex_count_at_fastq  1,897,253 
    Total_fragments  5,860,456 
    Duplicated_fragments  - 
    Duplication_rate  - 
    Uniquely_mapped_reads  54,139,637 
    Refined_complex  3,932,230 
    F = 1  2,706,105 
    F = 2  831,839 
    F = 3  256,067 
    F = 4  81,755 
    F = 5  28,388 
    F = 6  11,400 
    F = 7  5,400 
    F = 8  2,982 
    F = 9  1,812 
    F = 10  1,285 
    F = 11  919 
    F = 12  687 
    F = 13  511 
    F = 14  420 
    F = 15  376 
    F > 15  2,284 
    (ScSmOP) :~/CHDP/04.QualityAssess$ ls


3. SPRITE

Project description

This library is prepared from human B-lymphoblastoids GM12878 cell line with SPRITE. GSM3154194 SRR7216005

The reference genome we selected is hg38, and library name set to SPRITE.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose BWA index has been generated @

    ~/RefGenome/bwa_hg38_index/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir SPRITE
    (ScSmOP) :~$ cd SPRITE
    (ScSmOP) :~/SPRITE$

Download SPRITE FASTQ files from NCBI

    (ScSmOP) :~/SPRITE$ prefetch SRR7216005
    (ScSmOP) :~/SPRITE$ ln -s SRR7216005/SRR7216005.sra .
    (ScSmOP) :~/SPRITE$ fastq-dump --split-files --gzip SRR7216005.sra

There were 2 types of reads: Read 1, containing 8 bp tag DPM from its first base pair, followed by a 142 bp genomic fragment; Read 2, containing 9-12 bp tag Y followed by 15 bp tag ODD, followed by 15 bp tag EVEN followed by 15bp tag ODD. All the tag consist of a barcode labeling complexes. Genomic fragment need to be aligned to reference genome.

    (ScSmOP) :~/SPRITE$ ls
    SRR7216005_1.fastq.gz
    SRR7216005_2.fastq.gz
    SRR7216005.sra

Run pipeline scsmop.sh

    (ScSmOP) :~/SPRITE$ ~/ScSmOP/scsmop.sh -t sprite -n SPRITE -1 SRR7216005_1.fastq.gz -2 SRR7216005_2.fastq.gz -b ~/RefGenome/bwa_hg38_index/hg38.fa -@ 10

Check finish

    (ScSmOP) :~/SPRITE$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    03.GroupAndRefine
    GroupAndRefinement.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/SPRITE$ cd 04.QualityAssess
    (ScSmOP) :~/SPRITE/04.QualityAssess$ cat SPRITE_final_stat.tsv
    Total_read_pairs 44417751
    Read_pairs_with_full_barcodes 27219050
    Fully_barcode_rate .612796
    Complex_count_at_fastq  7132895
    Uniquely_mapped_reads 14917131
    Total_fragments 12959501
    Duplicated_fragments 3700046
    Duplication_rate .285508
    Total_qualified_fragment 4148651
    F = 1 3082301
    F = 2 556278
    F = 3 198932
    F = 4 97755
    F = 5 56534
    F = 6 35867
    F = 7 24160
    F = 8 17006
    F = 9 12701
    F = 10 9522
    F = 11 7643
    F = 12 6044
    F = 13 4968
    F = 14 4000
    F = 15 3234
    F > 15 31706

4. RNA-DNA SPRITE

Project description

This library is prepared from mESC cells with RNA-DNA SPRITE. GSM4579992 SRR11892191

The reference genome we selected is mm9, and library name set to rdSPRITE.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose BWA index has been generated @

    ~/RefGenome/bwa_mm9_index/

Suppose STAR index has been generated @

    ~/RefGenome/refdata-gex-mm9-2020-A-STAR/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir rdSPRITE
    (ScSmOP) :~$ cd rdSPRITE
    (ScSmOP) :~/rdSPRITE$

Download rdSPRITE FASTQ files from NCBI *Prepare ChIA-Drop data need sratools installed first, refer to sra-tools installation.

    (ScSmOP) :~/rdSPRITE$ prefetch SRR11892191
    (ScSmOP) :~/rdSPRITE$ ln -s SRR11892191/SRR11892191 .
    (ScSmOP) :~/rdSPRITE$ fastq-dump --split-files --gzip SRR11892191.sra

There were 2 types of reads: Read 1, containing 150 bp genomic fragment; Read 2, containing 9-12 bp tag Y followed by 15 bp tag ODD, followed by 15 bp tag EVEN, followed by 15bp tag ODD, followed by 14 bp tag DPM if the fragment is DNA, 14 bp tag RPM if the fragment is cDNA. All the tag consist of a barcode labeling complexes. Genomic fragment need to be aligned to reference genome.

    (ScSmOP) :~/rdSPRITE$ ls
    SRR11892191_1.fastq.gz
    SRR11892191_2.fastq.gz
    SRR11892191.sra

Run pipeline scsmop.sh

    (ScSmOP) :~/rdSPRITE$ ~/ScSmOP/scsmop.sh -t rdsprite -n rdSPRITE -1 SRR11892191_1.fastq.gz -2 SRR11892191_2.fastq.gz -b ~/RefGenome/bwa_mm9_index/mm9.fa -r ~/RefGenome/refdata-gex-mm9-2020-A-STAR/ -@ 10

Check finish

    (ScSmOP) :~/rdSPRITE$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    03.GroupAndRefine
    GroupAndRefinement.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/rdSPRITE$ cd 04.QualityAssess
    (ScSmOP) :~/rdSPRITE/04.QualityAssess$ cat rdSPRITE_final_stat.tsv
    Total_read_pairs  32,046,207 
    Read_pairs_with_full_barcode  20,548,965 
    DNA_read_pairs_with_full_barcode  17,143,610 
    RNA_read_pairs_with_full_barcode  3,405,355 
    Fully_barcode_rate 64.1%
    Uniquely_mapped_DNA_reads  13,147,325 
    Uniquely_mapped_RNA_reads  1,222,522 
    DNA_fragments  13,147,325 
    RNA_fragments  1,222,522 
    DNA_duplicated_fragments  5,145,732 
    RNA_duplicated_fragments  502,977 
    DNA_duplicate_rate 39.1%
    RNA_duplicate_rate 41.1%
    DNA_complex  4,365,209 
    F = 1  3,725,908 
    F = 2  397,707 
    F = 3  94,890 
    F = 4  39,544 
    F = 5  21,932 
    F = 6  14,349 
    F = 7  10,107 
    F = 8  7,737 
    F = 9  5,958 
    F = 10  4,860 
    F = 11  3,993 
    F = 12  3,293 
    F = 13  2,846 
    F = 14  2,412 
    F = 15  2,121 
    F > 15  27,550 
    RNA_complex  605,053 
    F = 1  569,287 
    F = 2  26,203 
    F = 3  4,394 
    F = 4  1,703 
    F = 5  841 
    F = 6  513 
    F = 7  363 
    F = 8  256 
    F = 9  196 
    F = 10  120 
    F = 11  130 
    F = 12  88 
    F = 13  93 
    F = 14  64 
    F = 15  51 
    F > 15  749 

5. scSPRITE

Project description

This library is prepared from mESC cells with scSPRITE. GSM4669508 SRR12212044 taking the first 10,000,000 reads.

The reference genome we selected is mm9, and library name set to scSPRITE.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose STAR index has been generated @

    ~/RefGenome/mm9_star_index_v2.7.9/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir scSPRITE
    (ScSmOP) :~$ cd scSPRITE
    (ScSmOP) :~/scSPRITE$

Download scSPRITE FASTQ files from NCBI *Prepare ChIA-Drop data need sratools installed first, refer to sra-tools installation.

    (ScSmOP) :~/scSPRITE$ prefetch SRR12212044
    (ScSmOP) :~/scSPRITE$ ln -s SRR12212044/SRR12212044.sra .
    (ScSmOP) :~/scSPRITE$ fastq-dump --split-files --gzip -X 10000000 SRR12212044.sra

There were 2 types of reads: Read 1, containing 150 bp genomic fragment; Read 2, containing 9-12 bp tag Y followed by 15 bp tag ODD, followed by 15 bp tag EVEN followed by 15bp tag ODD, followed by 26 bp tag DPM. All the tag consist of a barcode labeling complexes, the last 3 tag EVEN, ODD, DPM together labeling cells. Genomic fragment need to be aligned to reference genome.

    (ScSmOP) :~/scSPRITE$ ls
    SRR12212044_1.fastq.gz
    SRR12212044_2.fastq.gz
    SRR12212044.sra

Run pipeline scsmop.sh


    (ScSmOP) :~/scSPRITE$ ~/ScSmOP/scsmop.sh -t scsprite -n scSPRITE -1 SRR12212044_1.fastq.gz -2 SRR12212044_2.fastq.gz -r ~/RefGenome/mm9_star_index_v2.7.9 -@ 10

Check finish

    (ScSmOP) :~/scSPRITE$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    03.GroupAndRefine
    GroupAndRefinement.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/scSPRITE$ cd 04.QualityAssess
    (ScSmOP) :~/scSPRITE/04.QualityAssess$ cat scSPRITE_final_stat.tsv
    Total_read_pairs   10,000,000 
    Read_pairs_with_full_barcodes   4,928,398 
    Fully_barcode_rate  49.3%
    Cell_count_at_fastq   12,753 
    Complex_count_at_fastq   1,318,183 
    Uniquely_mapped_reads   4,132,307 
    Total_fragments   4,132,307 
    Duplicated_fragments   461,112 
    Duplication_rate  11.2%
    Total_qualified_complex   1,181,033 
    F = 1   1,009,839 
    F = 2   78,925 
    F = 3   26,538 
    F = 4   14,417 
    F = 5   9,233 
    F = 6   6,328 
    F = 7   4,737 
    F = 8   3,722 
    F = 9   2,911 
    F = 10   2,420 
    F = 11   1,942 
    F = 12   1,611 
    F = 13   1,411 
    F = 14   1,201 
    F = 15   1,099 
    F > 15   14,699 


6. 10× Genomics (V1) Single Cell ATAC

Project description

Library is prepared following the Chromium Next GEM Single Cell ATAC Reagent Kits v1.1 (User Guide CG000209 Rev A). 500 Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (v1 chemistry) https://www.10xgenomics.com/resources/datasets/500-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-next-gem-v-1-1-1-1-standard-2-0-0.

The reference genome we selected is hg38, and library name set to PBMC.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose BWA index has been generated @

    ~/RefGenome/bwa_hg38_index/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir PBMC
    (ScSmOP) :~$ cd PBMC
    (ScSmOP) :~/PBMC$

Download scRNA-seq FASTQ files from 10x Genomics Datasets

    (ScSmOP) :~/PBMC$ wget https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fastqs.tar
    (ScSmOP) :~/PBMC$ tar -xvf atac_pbmc_500_nextgem_fastqs.tar
    (ScSmOP) :~/PBMC$ cd atac_pbmc_500_nextgem_fastqs
    (ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ 

There were 3 types of reads: Read 1, containing 50 bp genomic fragments from its first base pair; Read 2, containing 16 bp cell barcode from its first base pair; Read 3 contain 49 bp genomic fragments from its first base pair. Genomic fragments need to be aligned to reference genome.

    (ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ ls
    atac_pbmc_500_nextgem_S1_L001_I1_001.fastq.gz  atac_pbmc_500_nextgem_S1_L002_I1_001.fastq.gz
    atac_pbmc_500_nextgem_S1_L001_R1_001.fastq.gz  atac_pbmc_500_nextgem_S1_L002_R1_001.fastq.gz
    atac_pbmc_500_nextgem_S1_L001_R2_001.fastq.gz  atac_pbmc_500_nextgem_S1_L002_R2_001.fastq.gz
    atac_pbmc_500_nextgem_S1_L001_R3_001.fastq.gz  atac_pbmc_500_nextgem_S1_L002_R3_001.fastq.gz

Run pipeline scsmop.sh

    (ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ ~/ScSmOP/scsmop.sh -t scatac_10x_v1 -n PBMC -1 atac_pbmc_500_nextgem_S1_L001_R1_001.fastq.gz,atac_pbmc_500_nextgem_S1_L002_R1_001.fastq.gz -2 atac_pbmc_500_nextgem_S1_L001_R2_001.fastq.gz,atac_pbmc_500_nextgem_S1_L002_R2_001.fastq.gz -3 atac_pbmc_500_nextgem_S1_L001_R3_001.fastq.gz,atac_pbmc_500_nextgem_S1_L002_R3_001.fastq.gz -b ~/RefGenome/bwa_hg38_index/hg38.fa -s ~/ScSmOP/ChromSize/hg38.size.txt -@ 10

Check finish

    (ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    03.GroupAndRefine
    GroupAndRefinement.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ cd 04.QualityAssess
    (ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs/04.QualityAssess$ cat PBMC_final_stat.tsv
    Total_read_pairs	 33,535,381 
    Read_pairs_with_full_barcodes	 32,892,355 
    Fully_barcode_rate	98.1%
    Cell_count_at_fastq	 249,760 
    Total_fragments	 30,210,513 
    Duplicated_fragments	 19,198,087 
    Duplication_rate	63.5%
    Fragments_overlap_peak	 8,202,392 
    Peak_count	 68,121 
    Cell_count	 484 


7. Chromium Single Cell Multiome ATAC + Gene Expression

Project description

Cryopreserved human peripheral blood mononuclear cells (PBMCs) from a healthy female donor aged 25 were obtained by 10x Genomics from AllCells.

Nuclei were isolated as described in the Demonstrated Protocol- Nuclei Isolation for Single Cell Multiome ATAC + Gene Expression Sequencing (CG000365 Rev A).

Paired ATAC and Gene Expression libraries were generated from the isolated nuclei as described in the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression User Guide (CG000338 Rev A) and sequenced on Illumina Novaseq 6000 v1 Kit (Forward Strand Dual-Index Workflow). https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-arc/2.0.0/pbmc_unsorted_3k/pbmc_unsorted_3k_fastqs.tar

The reference genome we selected is hg38, and library name set to PBMC.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose BWA index has been generated @

    ~/RefGenome/bwa_hg38_index/

Suppose STAR index has been generated @

    ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir PBMC_ARC
    (ScSmOP) :~$ cd PBMC_ARC
    (ScSmOP) :~/PBMC_ARC$

Download scRNA-seq FASTQ files from 10x Genomics Datasets

    (ScSmOP) :~/PBMC_ARC$ wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-arc/2.0.0/pbmc_unsorted_3k/pbmc_unsorted_3k_fastqs.tar
    (ScSmOP) :~/PBMC_ARC$ tar -xvf pbmc_unsorted_3k_fastqs.tar
    (ScSmOP) :~/PBMC_ARC$ cd pbmc_unsorted_3k_fastqs
    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ 

There were 5 types of reads:

ATAC: Read 1, containing 50 bp genomic fragments from its first base pair; Read 2, containing 16 bp cell barcode from its first base pair; Read 3 contain 49 bp genomic fragments from its first base pair. Genomic fragments need to be aligned to reference genome.

RNA: There were 2 types of reads: Read 1, containing 16 bp cell barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome.

    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls atac
    pbmc_unsorted_3k_S3_L001_I1_001.fastq.gz  pbmc_unsorted_3k_S3_L002_R2_001.fastq.gz  pbmc_unsorted_3k_S3_L004_I1_001.fastq.gz
    pbmc_unsorted_3k_S3_L001_R1_001.fastq.gz  pbmc_unsorted_3k_S3_L002_R3_001.fastq.gz  pbmc_unsorted_3k_S3_L004_R1_001.fastq.gz
    pbmc_unsorted_3k_S3_L001_R2_001.fastq.gz  pbmc_unsorted_3k_S3_L003_I1_001.fastq.gz  pbmc_unsorted_3k_S3_L004_R2_001.fastq.gz
    pbmc_unsorted_3k_S3_L001_R3_001.fastq.gz  pbmc_unsorted_3k_S3_L003_R1_001.fastq.gz  pbmc_unsorted_3k_S3_L004_R3_001.fastq.gz
    pbmc_unsorted_3k_S3_L002_I1_001.fastq.gz  pbmc_unsorted_3k_S3_L003_R2_001.fastq.gz
    pbmc_unsorted_3k_S3_L002_R1_001.fastq.gz  pbmc_unsorted_3k_S3_L003_R3_001.fastq.gz
    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls gex
    pbmc_unsorted_3k_S01_L003_I1_001.fastq.gz  pbmc_unsorted_3k_S01_L003_R2_001.fastq.gz  pbmc_unsorted_3k_S01_L004_R1_001.fastq.gz
    pbmc_unsorted_3k_S01_L003_I2_001.fastq.gz  pbmc_unsorted_3k_S01_L004_I1_001.fastq.gz  pbmc_unsorted_3k_S01_L004_R2_001.fastq.gz
    pbmc_unsorted_3k_S01_L003_R1_001.fastq.gz  pbmc_unsorted_3k_S01_L004_I2_001.fastq.gz

Run pipeline scsmop.sh

    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ~/ScSmOP/scsmop.sh -t scarc_10x_v1 -1 atac/pbmc_unsorted_3k_S3_L001_R1_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L002_R1_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L003_R1_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L004_R1_001.fastq.gz -2 atac/pbmc_unsorted_3k_S3_L001_R2_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L002_R2_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L003_R2_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L004_R2_001.fastq.gz -3 atac/pbmc_unsorted_3k_S3_L001_R3_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L002_R3_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L003_R3_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L004_R3_001.fastq.gz -4 gex/pbmc_unsorted_3k_S01_L003_R1_001.fastq.gz,gex/pbmc_unsorted_3k_S01_L004_R1_001.fastq.gz -5 gex/pbmc_unsorted_3k_S01_L003_R2_001.fastq.gz,gex/pbmc_unsorted_3k_S01_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -b ~/RefGenome/hg38/hg38.fa -s ~/ScSmOP-0.1.2/ChromSize/hg38.size.txt -@ 10

Check finish

    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls 
    RNAResult
    ATACResult
    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls RNAResult
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    04.QualityAssess
    QualityAssessment.done
    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls ATACResult
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    03.GroupAndRefine
    GroupAndRefinement.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ cd ATACResult/04.QualityAssess
    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs/ATACResult/04.QualityAssess$ cat PBMC_ARC_final_stat.tsv
    Total_read_pairs  82,781,574 
    Read_pairs_with_full_barcodes  80,598,161 
    Fully_barcode_rate 97.4%
    Cell_count_at_fastq  495,494 
    Total_fragments  70,185,372 
    Duplicated_fragments  12,441,889 
    Duplication_rate 17.7%
    Fragments_overlap_peak  56,468,640 
    Peak_count  221,238 
    Cell_count  2,190 
    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ cd ../../RNAResult/04.QualityAssess
    (ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs/RNAResult/04.QualityAssess$ cat PBMC_ARC_final_stat.tsv
    Total_read_pairs  170,301,081 
    Read_pairs_with_full_barcodes  162,198,018 
    Fully_barcode_rate 95.2%
    Cell_count_at_fastq  420,955 
    Cell_count_estimated  2,963 
    Total_gene_detected  23,746 
    Mean_gene_per_cell  845 
    Median_UMI_per_cell  1,591 

8. Visum Spatial Gene Expression

Project description

10X Genomics obtained fresh frozen mouse olfactory bulb tissue from BioIVT. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols – Tissue Preparation Guide (Demonstrated Protocol CG000240). Tissue sections of 10µm were placed on Visium Gene Expression slides, then fixed and stained following Methanol Fixation, H&E Staining & Imaging for Visium Spatial Protocols (CG000160).

The Visium Gene Expression library was prepared as described in the Visium Spatial Reagent Kits User Guide (CG000239 Rev D). https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_fastqs.tar.

Only processed Gene expression part, no image process performed.

The reference genome we selected is mm10, and library name set to Spatial.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose STAR index has been generated @

    ~/RefGenome/refdata-gex-mm10-2020-A-STAR/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir Spatial
    (ScSmOP) :~$ cd Spatial
    (ScSmOP) :~/Spatial$

Download scRNA-seq FASTQ files from 10x Genomics Datasets

    (ScSmOP) :~/Spatial$ wget https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_fastqs.tar
    (ScSmOP) :~/Spatial$ tar -xvf Visium_Mouse_Olfactory_Bulb_fastqs.tar
    (ScSmOP) :~/Spatial$ cd Visium_Mouse_Olfactory_Bulb_fastqs
    (ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ 

There were 2 types of reads: Read 1, containing 16 bp spatial barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome.

   (ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ ls
Visium_Mouse_Olfactory_Bulb_S1_L001_I1_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L003_I1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L001_I2_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L003_I2_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L001_R1_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L003_R1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L001_R2_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L003_R2_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_I1_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L004_I1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_I2_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L004_I2_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_R1_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L004_R1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_R2_001.fastq.gz  Visium_Mouse_Olfactory_Bulb_S1_L004_R2_001.fastq.gz

Run pipeline scsmop.sh

(ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n Spatial -1 Visium_Mouse_Olfactory_Bulb_S1_L001_R1_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L002_R1_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L003_R1_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L004_R1_001.fastq.gz -2 Visium_Mouse_Olfactory_Bulb_S1_L001_R2_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L002_R2_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L003_R2_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-mm10-2020-A-STAR/ -@ 10 -c ~/ScSmOP/ConfigFiles/10x_spatial-rna_config.json

Check finish

    (ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ cd 04.QualityAssess
    (ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs/04.QualityAssess$ cat Spatial_final_stat.tsv
    Number of Reads 46,878,299
    Reads With Valid Barcodes 45,617,284
    Fully_barcode_rate 97.3%
    Unique Reads in Spot Mapped to Gene 6,315,431
    Estimated Number of Spot 1,049
    Total Gene Detected 14,567
    Median Gene per Spot 1,540
    Median UMI per Spot 3,411

9. 10× Genomics (V3) Single Cell Gene Expression (parallel with UniverSC)

Project description

This library is prepared from universc (Nature Communication) 10× 3.0.0 https://github.com/minoda-lab/universc/tree/master/test/shared/cellranger-tiny-fastq/3.0.0.

The reference genome we selected is tinyref, and library name set to PBMC.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose STAR index has been generated @

    ~/RefGenome/tinyrefcellrange3/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir PBMC_test
    (ScSmOP) :~$ cd PBMC_test
    (ScSmOP) :~/PBMC_test$

Download scRNA-seq FASTQ files from universc github repo

    # Download the data from github: https://github.com/minoda-lab/universc/tree/master/test/shared/cellranger-tiny-fastq/3.0.0. 

There were 2 types of reads: Read 1, containing 16 bp cell barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome.

    (ScSmOP) :~/PBMC_test$ ls
    tinygex_S1_L001_I1_001.fastq.gz
    tinygex_S1_L001_R1_001.fastq.gz
    tinygex_S1_L001_R2_001.fastq.gz
    tinygex_S1_L002_I1_001.fastq.gz
    tinygex_S1_L002_R1_001.fastq.gz
    tinygex_S1_L002_R2_001.fastq.gz

Run pipeline scsmop.sh

    (ScSmOP) :~/PBMC_test$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n PBMC_test -1 tinygex_S1_L001_R1_001.fastq.gz,tinygex_S1_L002_R1_001.fastq.gz -2 tinygex_S1_L001_R2_001.fastq.gz,tinygex_S1_L002_R2_001.fastq.gz -r ~/RefGenome/tinyrefcellrange3/ -@ 10

Check finish

    (ScSmOP) :~/PBMC_test$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/PBMC_test$ cd 04.QualityAssess
    (ScSmOP) :~/PBMC_test/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
    Total_read_pairs 461083
    Read_pairs_with_full_barcodes 437122
    Fully_barcode_rate .948033
    Cell_count_at_fastq  11946
    Cell_count_estimated 1106
    Total_gene_detected 202
    Mean_gene_per_cell 21
    Median_UMI_per_cell 29

10. Drop-seq

Project description

This library is prepared from universc (Nature Communication) Drop-seq https://github.com/minoda-lab/universc/tree/master/test/shared/dropseq-test.

The reference genome we selected is tinyref, and library name set to Dropseq.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose STAR index has been generated @

    ~/RefGenome/tinyrefcellrange3/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir Dropseq
    (ScSmOP) :~$ cd Dropseq
    (ScSmOP) :~/Dropseq$

Download scRNA-seq FASTQ files from universc github repo

    # prepare the data as described at https://github.com/minoda-lab/universc/tree/master/test/shared/dropseq-test. 

There were 2 types of reads: Read 1, containing 12 bp cell barcode (chromium barcode) from its first base pair, followed by a 8 bp UMI; Read 2, containing variable lenghth of base pair of transcript which need to be aligned to reference genome.

    (ScSmOP) :~/Dropseq$ ls
    SRR1873277_Sample1_R1.fastq.gz
    SRR1873277_Sample1_R2.fastq.gz

Run pipeline scsmop.sh

    (ScSmOP) :~/Dropseq$ ~/ScSmOP/scsmop.sh -t dropseq -n Dropseq -1 SRR1873277_Sample1_R1.fastq.gz -2 SRR1873277_Sample1_R2.fastq.gz -r ~/RefGenome/tinyrefcellrange3/ -@ 10

Check finish

    (ScSmOP) :~/Dropseq$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/Dropseq$ cd 04.QualityAssess
    (ScSmOP) :~/Dropseq/04.QualityAssess$ cat Dropseq_final_stat.tsv
    Total_read_pairs 29265
    Read_pairs_with_full_barcodes 29265
    Fully_barcode_rate 1.000000
    Cell_count_at_fastq  6828
    Cell_count_estimated 278
    Total_gene_detected 148
    Mean_gene_per_cell 22
    Median_UMI_per_cell 40

11. DIY

Project description

Library was constructed by processing GM12878 cells using 10× Genomics Single Cell Multiome ATAC + Gene Expression kit. Then loaded to Chromium platform to amplify RNA. So, the library is a 10× Genomics Single Cell Gene Expression library but the barcodes are from 10× Genomics Single Cell Multiome ATAC + Gene Expression’s Gene Expression part.

The reference genome we selected is hg38, and library name set to SHG023.

Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP

Suppose ScSmOP pipeline has been installed @ If downloaded ScSmOP through wget, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/

    ~/ScSmOP/

Suppose STAR index has been generated @

    ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/

Make directory for the project

    (base) :~$ conda activate ScSmOP
    (ScSmOP) :~$ mkdir SHG023
    (ScSmOP) :~$ cd SHG023
    (ScSmOP) :~/SHG023$

Download scRNA-seq FASTQ files from Google drive

    # prepare the data as described at https://github.com/minoda-lab/universc/tree/master/test/shared/dropseq-test. 

Read 1, containing 16 bp cell barcode from its first base pair, followed by a 12 bp UMI, 34 bp PolyT tail, 88 bp transcript; Read 2, containing 150 bp transcript which need to be aligned to reference genome; I5, I7 containing sample index to distinguish different samples, but this library just has one sample, so they are not used.


Read structure

    (ScSmOP) :~/SHG023$ ls
    SHG203_S1_L004_R1_001.fastq.gz
    SHG203_S1_L004_R2_001.fastq.gz
    SHG203_S1_L004_I1_001.fastq.gz
    SHG203_S1_L004_I2_001.fastq.gz

There are two ways to process such library:

Type 1: Ignore the PloyA tail and 88bp transcripts in Read 1. Then the library will have same read structure as 10× Genomics (V3) Single Cell Gene Expression with different barcode whitelist, then process the library as a scrna_10x_v3 library.

Type 2: DIY a new configuration file specific for the library.

Type 1

Run pipeline scsmop.sh

    (ScSmOP) :~/SHG023$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n SHG023 -1 SHG203_S1_L004_R1_001.fastq.gz -2 SHG203_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -@ 10 -w ~/ScSmOP/BarcodeBucket/737K-arc-v1-scrna.txt

Check finish

    (ScSmOP) :~/SHG023$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/SHG023$ cd 04.QualityAssess
    (ScSmOP) :~/SHG023/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
    Total_read_pairs 69086417
    Read_pairs_with_full_barcodes 60926690
    Fully_barcode_rate 0.881891
    Cell_count_at_fastq  341342
    Cell_count_estimated 2702
    Total_gene_detected 6207
    Mean_gene_per_cell 3
    Median_UMI_per_cell 2

Type 2

A generate DIY procedure of ScSmOP contains 4 steps:

  1. Prepare FASTQ files.
  2. Prepare barcode whitelist files.
  3. Generate custom configuration file.
  4. Decide experiment type and replace default configuration file with custom configuration file when run scsmop.sh.

DIY-procedure

Edit configuration file

    (ScSmOP) :~/SHG023$ cp ~/Work/ScSmOP/ConfigFiles/OriginalConfigFile.json .
    (ScSmOP) :~/SHG023$ ln -s ScSmOP/BarcodeBucket/737K-arc-v1-scrna.txt .
    (ScSmOP) :~/SHG023$ vi OriginalConfigFile.json

Example procedure of SHG023.

Exp-procedure

Modifying OriginalConfigFile.json

    {
        "barcode chain" : [ {"BC-UMI": "R1:1"}, {"GENOMEA|2": "R2:1"}],
        "identifier" : [ {"CELL":"BC"} ],
        "barcode type" :
        {
            "BC":
            {
                "DENSE": 1,
                "SPACE":0,
                "LAXITY":0,
                "LENGTH":16,
                "MISMATCH":1,
                "WHITE LIST":"737K-arc-v1-scrna.txt"
            },
            "UMI":
            {
                "SPACE": 0,
                "LAXITY": 0,
                "LENGTH": "12",
                "MISMATCH": 0,
                "WHITE LIST":""
            }
        }
    }

Press Esc -> shift + : -> wq -> enter in your keyboard to exit the edition.

Generate custom configuration file

    (ScSmOP) :~/SHG023$ ~/ScSmOP/Tools/python3 ~/ScSmOP/PythonScript/GenerateConfigFile.py -i OriginalConfigFile.json -o SHG023
    (ScSmOP) :~/SHG023$ ls
    OriginalConfigFile.json
    737K-arc-v1-scrna.txt
    SHG023_config.json
    SHG203_S1_L004_R1_001.fastq.gz
    SHG203_S1_L004_R2_001.fastq.gz
    SHG203_S1_L004_I1_001.fastq.gz
    SHG203_S1_L004_I2_001.fastq.gz

Run scsmop.sh with custom configuration file

This is still a scRNA-seq library require UMI deduplication and gene annotation, set -t to "scrna_10x_v3", set -c to .

    (ScSmOP) :~/SHG023$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n SHG023 -1 SHG203_S1_L004_R1_001.fastq.gz -2 SHG203_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -@ 10 -c SHG023_config.json

Check finish

    (ScSmOP) :~/SHG023$ ls 
    01.BarcodeIden
    BarcodeIdentification.done 
    02.ReadAlign
    SequenceAlignment.done
    04.QualityAssess
    QualityAssessment.done

Get statistic

    (ScSmOP) :~/SHG023$ cd 04.QualityAssess
    (ScSmOP) :~/SHG023/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
    Total_read_pairs 69086417
    Read_pairs_with_full_barcodes 60926690
    Fully_barcode_rate 0.881891
    Cell_count_at_fastq  341342
    Cell_count_estimated 2702
    Total_gene_detected 6207
    Mean_gene_per_cell 3
    Median_UMI_per_cell 2
⚠️ **GitHub.com Fallback** ⚠️