Case Study - ZhengmzLab/ScSmOP GitHub Wiki
We prepared testing datasets from different types of single-cell single-molecule omics.
Case | Omics | Experiment | Dataset | Source |
---|---|---|---|---|
1. 10× Genomics (V3) Single Cell Gene Expression | scRNA-seq | 10× Genomics (v3) Single Cell Gene Expression | PBMC_5k | 10X Genomics |
2. ChIA-Drop | Single-molecule 3D Genomics | ChIA-Drop | ChIA-Drop | GSM3347523 |
3. SPRITE | Single-molecule 3D Genomics | SPRITE | SPRITE | GSM3154194 |
4. RNA-DNA SPRITE | Sinle-molecule 3D Genomics | RNA-DNA SPRITE | rdSPRITE | GSM4579992 |
5. scSPRITE | Single-cell single-molecule 3D Genomics | scSPRITE | mESC scSPRITE | GSM4669508 |
6. 10× Genomics (V1) Single Cell ATAC | scATAC-seq | 10× Genomics (v1) Single Cell ATAC | Human PBMC | 10X Genomics |
7. Chromium Single Cell Multiome ATAC + Gene Expression | Single-cell multisome | Chromium Single Cell Multiome ATAC + Gene Expression | Human PBMC | 10X Genomics |
8. Visum Spatial Gene Expression | Spatial Transcriptomics | Visum Spatial Gene Expression | mouse olfactory bulb | 10X Genomics |
9. 10× Genomics (V3) Single Cell Gene Expression (parallel with UniverSC) | scRNA-seq | 10× Genomics (v3) Single Cell Gene Expression | Tiny FASTQ of human chr21 | Drop-seq GitHub |
10. Drop-seq (parallel with UniverSC) | scRNA-seq | Drop-seq | Tiny FASTQ of human chr21 | Drop-seq GitHub |
11. DIY | scRNA-seq | 10× Genomics (v3) Single Cell Gene Expression using Chromium Single Cell Multiome ATAC + Gene Expression kit capture 3' end RNA | Human GM12878 and Drosophila S2 cell | Home brewed |
Project description
This library is prepared from peripheral blood mononuclear cells (PBMCs) from a human. 5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (v3 chemistry) https://www.10xgenomics.com/resources/datasets/5-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-1-standard-3-0-2.
The reference genome we selected is hg38, and library name set to PBMC_5k.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose STAR index has been generated @
~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir scRNA_PBMC5K
(ScSmOP) :~$ cd scRNA_PBMC5K
(ScSmOP) :~/scRNA_PBMC5K$
Download scRNA-seq FASTQ files from 10x Genomics Datasets
(ScSmOP) :~/scRNA_PBMC5K$ wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/3.0.2/5k_pbmc_v3/5k_pbmc_v3_fastqs.tar
(ScSmOP) :~/scRNA_PBMC5K$ tar -xvf 5k_pbmc_v3_fastqs.tar
(ScSmOP) :~/scRNA_PBMC5K$ cd 5k_pbmc_v3_fastqs
(ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$
There were 3 types of reads: Read 1, containing 16 bp cell barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome, Sample index I1 read which is not used.
(ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ ls
5k_pbmc_v3_S1_L001_I1_001.fastq.gz 5k_pbmc_v3_S1_L003_I1_001.fastq.gz
5k_pbmc_v3_S1_L001_R1_001.fastq.gz 5k_pbmc_v3_S1_L003_R1_001.fastq.gz
5k_pbmc_v3_S1_L001_R2_001.fastq.gz 5k_pbmc_v3_S1_L003_R2_001.fastq.gz
5k_pbmc_v3_S1_L002_I1_001.fastq.gz 5k_pbmc_v3_S1_L004_I1_001.fastq.gz
5k_pbmc_v3_S1_L002_R1_001.fastq.gz 5k_pbmc_v3_S1_L004_R1_001.fastq.gz
5k_pbmc_v3_S1_L002_R2_001.fastq.gz 5k_pbmc_v3_S1_L004_R2_001.fastq.gz
Run pipeline scsmop.sh
(ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n PBMC_5k -1 5k_pbmc_v3_S1_L001_R1_001.fastq.gz,5k_pbmc_v3_S1_L002_R1_001.fastq.gz,5k_pbmc_v3_S1_L003_R1_001.fastq.gz,5k_pbmc_v3_S1_L004_R1_001.fastq.gz -2 5k_pbmc_v3_S1_L001_R2_001.fastq.gz,5k_pbmc_v3_S1_L002_R2_001.fastq.gz,5k_pbmc_v3_S1_L003_R2_001.fastq.gz,5k_pbmc_v3_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -@ 10
Check finish
(ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs$ cd 04.QualityAssess
(ScSmOP) :~/scRNA_PBMC5K/5k_pbmc_v3_fastqs/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
Total_read_pairs 383941607
Read_pairs_with_full_barcodes 375636262
Fully_barcode_rate .978368
Cell_count_at_fastq 1334303
Cell_count_estimated 4544
Total_gene_detected 23801
Mean_gene_per_cell 2538
Median_UMI_per_cell 8412
Project description
This library is prepared from S2 cells wich ChIA-Drop. GSM3347523 SRR7722051.
The reference genome selected is dm3, and library name set to CHDP.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose bwa index has been generated @
~/RefGenome/bwa_dm3_index/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir CHDP
(ScSmOP) :~$ cd CHDP
(ScSmOP) :~/CHDP$
Download ChIA-Drop FASTQ files from NCBI *Prepare ChIA-Drop data need sratools installed first, refer to sra-tools installation.
(ScSmOP) :~/CHDP$ prefetch SRR7722051
(ScSmOP) :~/CHDP$ ln -s SRR7722051/SRR7722051.sra .
(ScSmOP) :~/CHDP$ fastq-dump --split-files --gzip SRR7722051.sra
There were 2 types of reads: Read 1, containing 16 bp complex barcode (chromium barcode) from its first base pair, followed by a 8 bp spacer and 127bp genomic fragment; Read 2, containing 151 bp genomic fragment. Genomic fragment which need to be aligned to reference genome.
(ScSmOP) :~/CHDP$ ls
SRR7722051_1.fastq.gz
SRR7722051_2.fastq.gz
SRR7722051.sra
Run pipeline scsmop.sh
(ScSmOP) :~/CHDP$ ~/ScSmOP/scsmop.sh -t chiadrop -n CHDP -1 SRR7722051_1.fastq.gz -2 SRR7722051_2.fastq.gz -b ~/RefGenome/bwa_dm3_index/dm3.fa -s ~/ScSmOP/ChromSize/dm3.size.txt -@ 10
Check finish
(ScSmOP) :~/CHDP$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
03.GroupAndRefine
GroupAndRefinement.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/CHDP$ cd 04.QualityAssess
(ScSmOP) :~/CHDP/04.QualityAssess$ cat CHDP_final_stat.tsv
Total_read_pairs 71,667,907
Read_pairs_with_full_barcodes 66,940,410
Fully_barcode_rate 93.4%
Complex_count_at_fastq 1,897,253
Total_fragments 5,860,456
Duplicated_fragments -
Duplication_rate -
Uniquely_mapped_reads 54,139,637
Refined_complex 3,932,230
F = 1 2,706,105
F = 2 831,839
F = 3 256,067
F = 4 81,755
F = 5 28,388
F = 6 11,400
F = 7 5,400
F = 8 2,982
F = 9 1,812
F = 10 1,285
F = 11 919
F = 12 687
F = 13 511
F = 14 420
F = 15 376
F > 15 2,284
(ScSmOP) :~/CHDP/04.QualityAssess$ ls
Project description
This library is prepared from human B-lymphoblastoids GM12878 cell line with SPRITE. GSM3154194 SRR7216005
The reference genome we selected is hg38, and library name set to SPRITE.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose BWA index has been generated @
~/RefGenome/bwa_hg38_index/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir SPRITE
(ScSmOP) :~$ cd SPRITE
(ScSmOP) :~/SPRITE$
Download SPRITE FASTQ files from NCBI
(ScSmOP) :~/SPRITE$ prefetch SRR7216005
(ScSmOP) :~/SPRITE$ ln -s SRR7216005/SRR7216005.sra .
(ScSmOP) :~/SPRITE$ fastq-dump --split-files --gzip SRR7216005.sra
There were 2 types of reads: Read 1, containing 8 bp tag DPM from its first base pair, followed by a 142 bp genomic fragment; Read 2, containing 9-12 bp tag Y followed by 15 bp tag ODD, followed by 15 bp tag EVEN followed by 15bp tag ODD. All the tag consist of a barcode labeling complexes. Genomic fragment need to be aligned to reference genome.
(ScSmOP) :~/SPRITE$ ls
SRR7216005_1.fastq.gz
SRR7216005_2.fastq.gz
SRR7216005.sra
Run pipeline scsmop.sh
(ScSmOP) :~/SPRITE$ ~/ScSmOP/scsmop.sh -t sprite -n SPRITE -1 SRR7216005_1.fastq.gz -2 SRR7216005_2.fastq.gz -b ~/RefGenome/bwa_hg38_index/hg38.fa -@ 10
Check finish
(ScSmOP) :~/SPRITE$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
03.GroupAndRefine
GroupAndRefinement.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/SPRITE$ cd 04.QualityAssess
(ScSmOP) :~/SPRITE/04.QualityAssess$ cat SPRITE_final_stat.tsv
Total_read_pairs 44417751
Read_pairs_with_full_barcodes 27219050
Fully_barcode_rate .612796
Complex_count_at_fastq 7132895
Uniquely_mapped_reads 14917131
Total_fragments 12959501
Duplicated_fragments 3700046
Duplication_rate .285508
Total_qualified_fragment 4148651
F = 1 3082301
F = 2 556278
F = 3 198932
F = 4 97755
F = 5 56534
F = 6 35867
F = 7 24160
F = 8 17006
F = 9 12701
F = 10 9522
F = 11 7643
F = 12 6044
F = 13 4968
F = 14 4000
F = 15 3234
F > 15 31706
Project description
This library is prepared from mESC cells with RNA-DNA SPRITE. GSM4579992 SRR11892191
The reference genome we selected is mm9, and library name set to rdSPRITE.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose BWA index has been generated @
~/RefGenome/bwa_mm9_index/
Suppose STAR index has been generated @
~/RefGenome/refdata-gex-mm9-2020-A-STAR/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir rdSPRITE
(ScSmOP) :~$ cd rdSPRITE
(ScSmOP) :~/rdSPRITE$
Download rdSPRITE FASTQ files from NCBI *Prepare ChIA-Drop data need sratools installed first, refer to sra-tools installation.
(ScSmOP) :~/rdSPRITE$ prefetch SRR11892191
(ScSmOP) :~/rdSPRITE$ ln -s SRR11892191/SRR11892191 .
(ScSmOP) :~/rdSPRITE$ fastq-dump --split-files --gzip SRR11892191.sra
There were 2 types of reads: Read 1, containing 150 bp genomic fragment; Read 2, containing 9-12 bp tag Y followed by 15 bp tag ODD, followed by 15 bp tag EVEN, followed by 15bp tag ODD, followed by 14 bp tag DPM if the fragment is DNA, 14 bp tag RPM if the fragment is cDNA. All the tag consist of a barcode labeling complexes. Genomic fragment need to be aligned to reference genome.
(ScSmOP) :~/rdSPRITE$ ls
SRR11892191_1.fastq.gz
SRR11892191_2.fastq.gz
SRR11892191.sra
Run pipeline scsmop.sh
(ScSmOP) :~/rdSPRITE$ ~/ScSmOP/scsmop.sh -t rdsprite -n rdSPRITE -1 SRR11892191_1.fastq.gz -2 SRR11892191_2.fastq.gz -b ~/RefGenome/bwa_mm9_index/mm9.fa -r ~/RefGenome/refdata-gex-mm9-2020-A-STAR/ -@ 10
Check finish
(ScSmOP) :~/rdSPRITE$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
03.GroupAndRefine
GroupAndRefinement.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/rdSPRITE$ cd 04.QualityAssess
(ScSmOP) :~/rdSPRITE/04.QualityAssess$ cat rdSPRITE_final_stat.tsv
Total_read_pairs 32,046,207
Read_pairs_with_full_barcode 20,548,965
DNA_read_pairs_with_full_barcode 17,143,610
RNA_read_pairs_with_full_barcode 3,405,355
Fully_barcode_rate 64.1%
Uniquely_mapped_DNA_reads 13,147,325
Uniquely_mapped_RNA_reads 1,222,522
DNA_fragments 13,147,325
RNA_fragments 1,222,522
DNA_duplicated_fragments 5,145,732
RNA_duplicated_fragments 502,977
DNA_duplicate_rate 39.1%
RNA_duplicate_rate 41.1%
DNA_complex 4,365,209
F = 1 3,725,908
F = 2 397,707
F = 3 94,890
F = 4 39,544
F = 5 21,932
F = 6 14,349
F = 7 10,107
F = 8 7,737
F = 9 5,958
F = 10 4,860
F = 11 3,993
F = 12 3,293
F = 13 2,846
F = 14 2,412
F = 15 2,121
F > 15 27,550
RNA_complex 605,053
F = 1 569,287
F = 2 26,203
F = 3 4,394
F = 4 1,703
F = 5 841
F = 6 513
F = 7 363
F = 8 256
F = 9 196
F = 10 120
F = 11 130
F = 12 88
F = 13 93
F = 14 64
F = 15 51
F > 15 749
Project description
This library is prepared from mESC cells with scSPRITE. GSM4669508 SRR12212044 taking the first 10,000,000 reads.
The reference genome we selected is mm9, and library name set to scSPRITE.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose STAR index has been generated @
~/RefGenome/mm9_star_index_v2.7.9/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir scSPRITE
(ScSmOP) :~$ cd scSPRITE
(ScSmOP) :~/scSPRITE$
Download scSPRITE FASTQ files from NCBI *Prepare ChIA-Drop data need sratools installed first, refer to sra-tools installation.
(ScSmOP) :~/scSPRITE$ prefetch SRR12212044
(ScSmOP) :~/scSPRITE$ ln -s SRR12212044/SRR12212044.sra .
(ScSmOP) :~/scSPRITE$ fastq-dump --split-files --gzip -X 10000000 SRR12212044.sra
There were 2 types of reads: Read 1, containing 150 bp genomic fragment; Read 2, containing 9-12 bp tag Y followed by 15 bp tag ODD, followed by 15 bp tag EVEN followed by 15bp tag ODD, followed by 26 bp tag DPM. All the tag consist of a barcode labeling complexes, the last 3 tag EVEN, ODD, DPM together labeling cells. Genomic fragment need to be aligned to reference genome.
(ScSmOP) :~/scSPRITE$ ls
SRR12212044_1.fastq.gz
SRR12212044_2.fastq.gz
SRR12212044.sra
Run pipeline scsmop.sh
(ScSmOP) :~/scSPRITE$ ~/ScSmOP/scsmop.sh -t scsprite -n scSPRITE -1 SRR12212044_1.fastq.gz -2 SRR12212044_2.fastq.gz -r ~/RefGenome/mm9_star_index_v2.7.9 -@ 10
Check finish
(ScSmOP) :~/scSPRITE$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
03.GroupAndRefine
GroupAndRefinement.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/scSPRITE$ cd 04.QualityAssess
(ScSmOP) :~/scSPRITE/04.QualityAssess$ cat scSPRITE_final_stat.tsv
Total_read_pairs 10,000,000
Read_pairs_with_full_barcodes 4,928,398
Fully_barcode_rate 49.3%
Cell_count_at_fastq 12,753
Complex_count_at_fastq 1,318,183
Uniquely_mapped_reads 4,132,307
Total_fragments 4,132,307
Duplicated_fragments 461,112
Duplication_rate 11.2%
Total_qualified_complex 1,181,033
F = 1 1,009,839
F = 2 78,925
F = 3 26,538
F = 4 14,417
F = 5 9,233
F = 6 6,328
F = 7 4,737
F = 8 3,722
F = 9 2,911
F = 10 2,420
F = 11 1,942
F = 12 1,611
F = 13 1,411
F = 14 1,201
F = 15 1,099
F > 15 14,699
Project description
Library is prepared following the Chromium Next GEM Single Cell ATAC Reagent Kits v1.1 (User Guide CG000209 Rev A). 500 Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (v1 chemistry) https://www.10xgenomics.com/resources/datasets/500-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-next-gem-v-1-1-1-1-standard-2-0-0.
The reference genome we selected is hg38, and library name set to PBMC.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose BWA index has been generated @
~/RefGenome/bwa_hg38_index/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir PBMC
(ScSmOP) :~$ cd PBMC
(ScSmOP) :~/PBMC$
Download scRNA-seq FASTQ files from 10x Genomics Datasets
(ScSmOP) :~/PBMC$ wget https://cf.10xgenomics.com/samples/cell-atac/2.0.0/atac_pbmc_500_nextgem/atac_pbmc_500_nextgem_fastqs.tar
(ScSmOP) :~/PBMC$ tar -xvf atac_pbmc_500_nextgem_fastqs.tar
(ScSmOP) :~/PBMC$ cd atac_pbmc_500_nextgem_fastqs
(ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$
There were 3 types of reads: Read 1, containing 50 bp genomic fragments from its first base pair; Read 2, containing 16 bp cell barcode from its first base pair; Read 3 contain 49 bp genomic fragments from its first base pair. Genomic fragments need to be aligned to reference genome.
(ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ ls
atac_pbmc_500_nextgem_S1_L001_I1_001.fastq.gz atac_pbmc_500_nextgem_S1_L002_I1_001.fastq.gz
atac_pbmc_500_nextgem_S1_L001_R1_001.fastq.gz atac_pbmc_500_nextgem_S1_L002_R1_001.fastq.gz
atac_pbmc_500_nextgem_S1_L001_R2_001.fastq.gz atac_pbmc_500_nextgem_S1_L002_R2_001.fastq.gz
atac_pbmc_500_nextgem_S1_L001_R3_001.fastq.gz atac_pbmc_500_nextgem_S1_L002_R3_001.fastq.gz
Run pipeline scsmop.sh
(ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ ~/ScSmOP/scsmop.sh -t scatac_10x_v1 -n PBMC -1 atac_pbmc_500_nextgem_S1_L001_R1_001.fastq.gz,atac_pbmc_500_nextgem_S1_L002_R1_001.fastq.gz -2 atac_pbmc_500_nextgem_S1_L001_R2_001.fastq.gz,atac_pbmc_500_nextgem_S1_L002_R2_001.fastq.gz -3 atac_pbmc_500_nextgem_S1_L001_R3_001.fastq.gz,atac_pbmc_500_nextgem_S1_L002_R3_001.fastq.gz -b ~/RefGenome/bwa_hg38_index/hg38.fa -s ~/ScSmOP/ChromSize/hg38.size.txt -@ 10
Check finish
(ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
03.GroupAndRefine
GroupAndRefinement.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs$ cd 04.QualityAssess
(ScSmOP) :~/PBMC/atac_pbmc_500_nextgem_fastqs/04.QualityAssess$ cat PBMC_final_stat.tsv
Total_read_pairs 33,535,381
Read_pairs_with_full_barcodes 32,892,355
Fully_barcode_rate 98.1%
Cell_count_at_fastq 249,760
Total_fragments 30,210,513
Duplicated_fragments 19,198,087
Duplication_rate 63.5%
Fragments_overlap_peak 8,202,392
Peak_count 68,121
Cell_count 484
Project description
Cryopreserved human peripheral blood mononuclear cells (PBMCs) from a healthy female donor aged 25 were obtained by 10x Genomics from AllCells.
Nuclei were isolated as described in the Demonstrated Protocol- Nuclei Isolation for Single Cell Multiome ATAC + Gene Expression Sequencing (CG000365 Rev A).
Paired ATAC and Gene Expression libraries were generated from the isolated nuclei as described in the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression User Guide (CG000338 Rev A) and sequenced on Illumina Novaseq 6000 v1 Kit (Forward Strand Dual-Index Workflow). https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-arc/2.0.0/pbmc_unsorted_3k/pbmc_unsorted_3k_fastqs.tar
The reference genome we selected is hg38, and library name set to PBMC.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose BWA index has been generated @
~/RefGenome/bwa_hg38_index/
Suppose STAR index has been generated @
~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir PBMC_ARC
(ScSmOP) :~$ cd PBMC_ARC
(ScSmOP) :~/PBMC_ARC$
Download scRNA-seq FASTQ files from 10x Genomics Datasets
(ScSmOP) :~/PBMC_ARC$ wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-arc/2.0.0/pbmc_unsorted_3k/pbmc_unsorted_3k_fastqs.tar
(ScSmOP) :~/PBMC_ARC$ tar -xvf pbmc_unsorted_3k_fastqs.tar
(ScSmOP) :~/PBMC_ARC$ cd pbmc_unsorted_3k_fastqs
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$
There were 5 types of reads:
ATAC: Read 1, containing 50 bp genomic fragments from its first base pair; Read 2, containing 16 bp cell barcode from its first base pair; Read 3 contain 49 bp genomic fragments from its first base pair. Genomic fragments need to be aligned to reference genome.
RNA: There were 2 types of reads: Read 1, containing 16 bp cell barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome.
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls atac
pbmc_unsorted_3k_S3_L001_I1_001.fastq.gz pbmc_unsorted_3k_S3_L002_R2_001.fastq.gz pbmc_unsorted_3k_S3_L004_I1_001.fastq.gz
pbmc_unsorted_3k_S3_L001_R1_001.fastq.gz pbmc_unsorted_3k_S3_L002_R3_001.fastq.gz pbmc_unsorted_3k_S3_L004_R1_001.fastq.gz
pbmc_unsorted_3k_S3_L001_R2_001.fastq.gz pbmc_unsorted_3k_S3_L003_I1_001.fastq.gz pbmc_unsorted_3k_S3_L004_R2_001.fastq.gz
pbmc_unsorted_3k_S3_L001_R3_001.fastq.gz pbmc_unsorted_3k_S3_L003_R1_001.fastq.gz pbmc_unsorted_3k_S3_L004_R3_001.fastq.gz
pbmc_unsorted_3k_S3_L002_I1_001.fastq.gz pbmc_unsorted_3k_S3_L003_R2_001.fastq.gz
pbmc_unsorted_3k_S3_L002_R1_001.fastq.gz pbmc_unsorted_3k_S3_L003_R3_001.fastq.gz
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls gex
pbmc_unsorted_3k_S01_L003_I1_001.fastq.gz pbmc_unsorted_3k_S01_L003_R2_001.fastq.gz pbmc_unsorted_3k_S01_L004_R1_001.fastq.gz
pbmc_unsorted_3k_S01_L003_I2_001.fastq.gz pbmc_unsorted_3k_S01_L004_I1_001.fastq.gz pbmc_unsorted_3k_S01_L004_R2_001.fastq.gz
pbmc_unsorted_3k_S01_L003_R1_001.fastq.gz pbmc_unsorted_3k_S01_L004_I2_001.fastq.gz
Run pipeline scsmop.sh
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ~/ScSmOP/scsmop.sh -t scarc_10x_v1 -1 atac/pbmc_unsorted_3k_S3_L001_R1_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L002_R1_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L003_R1_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L004_R1_001.fastq.gz -2 atac/pbmc_unsorted_3k_S3_L001_R2_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L002_R2_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L003_R2_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L004_R2_001.fastq.gz -3 atac/pbmc_unsorted_3k_S3_L001_R3_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L002_R3_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L003_R3_001.fastq.gz,atac/pbmc_unsorted_3k_S3_L004_R3_001.fastq.gz -4 gex/pbmc_unsorted_3k_S01_L003_R1_001.fastq.gz,gex/pbmc_unsorted_3k_S01_L004_R1_001.fastq.gz -5 gex/pbmc_unsorted_3k_S01_L003_R2_001.fastq.gz,gex/pbmc_unsorted_3k_S01_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -b ~/RefGenome/hg38/hg38.fa -s ~/ScSmOP-0.1.2/ChromSize/hg38.size.txt -@ 10
Check finish
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls
RNAResult
ATACResult
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls RNAResult
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
04.QualityAssess
QualityAssessment.done
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ ls ATACResult
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
03.GroupAndRefine
GroupAndRefinement.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ cd ATACResult/04.QualityAssess
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs/ATACResult/04.QualityAssess$ cat PBMC_ARC_final_stat.tsv
Total_read_pairs 82,781,574
Read_pairs_with_full_barcodes 80,598,161
Fully_barcode_rate 97.4%
Cell_count_at_fastq 495,494
Total_fragments 70,185,372
Duplicated_fragments 12,441,889
Duplication_rate 17.7%
Fragments_overlap_peak 56,468,640
Peak_count 221,238
Cell_count 2,190
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs$ cd ../../RNAResult/04.QualityAssess
(ScSmOP) :~/PBMC_ARC/pbmc_unsorted_3k_fastqs/RNAResult/04.QualityAssess$ cat PBMC_ARC_final_stat.tsv
Total_read_pairs 170,301,081
Read_pairs_with_full_barcodes 162,198,018
Fully_barcode_rate 95.2%
Cell_count_at_fastq 420,955
Cell_count_estimated 2,963
Total_gene_detected 23,746
Mean_gene_per_cell 845
Median_UMI_per_cell 1,591
Project description
10X Genomics obtained fresh frozen mouse olfactory bulb tissue from BioIVT. The tissue was embedded and cryosectioned as described in Visium Spatial Protocols – Tissue Preparation Guide (Demonstrated Protocol CG000240). Tissue sections of 10µm were placed on Visium Gene Expression slides, then fixed and stained following Methanol Fixation, H&E Staining & Imaging for Visium Spatial Protocols (CG000160).
The Visium Gene Expression library was prepared as described in the Visium Spatial Reagent Kits User Guide (CG000239 Rev D). https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_fastqs.tar.
Only processed Gene expression part, no image process performed.
The reference genome we selected is mm10, and library name set to Spatial.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose STAR index has been generated @
~/RefGenome/refdata-gex-mm10-2020-A-STAR/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir Spatial
(ScSmOP) :~$ cd Spatial
(ScSmOP) :~/Spatial$
Download scRNA-seq FASTQ files from 10x Genomics Datasets
(ScSmOP) :~/Spatial$ wget https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_fastqs.tar
(ScSmOP) :~/Spatial$ tar -xvf Visium_Mouse_Olfactory_Bulb_fastqs.tar
(ScSmOP) :~/Spatial$ cd Visium_Mouse_Olfactory_Bulb_fastqs
(ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$
There were 2 types of reads: Read 1, containing 16 bp spatial barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome.
(ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ ls
Visium_Mouse_Olfactory_Bulb_S1_L001_I1_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L003_I1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L001_I2_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L003_I2_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L001_R1_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L003_R1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L001_R2_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L003_R2_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_I1_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L004_I1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_I2_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L004_I2_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_R1_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L004_R1_001.fastq.gz
Visium_Mouse_Olfactory_Bulb_S1_L002_R2_001.fastq.gz Visium_Mouse_Olfactory_Bulb_S1_L004_R2_001.fastq.gz
Run pipeline scsmop.sh
(ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n Spatial -1 Visium_Mouse_Olfactory_Bulb_S1_L001_R1_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L002_R1_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L003_R1_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L004_R1_001.fastq.gz -2 Visium_Mouse_Olfactory_Bulb_S1_L001_R2_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L002_R2_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L003_R2_001.fastq.gz,Visium_Mouse_Olfactory_Bulb_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-mm10-2020-A-STAR/ -@ 10 -c ~/ScSmOP/ConfigFiles/10x_spatial-rna_config.json
Check finish
(ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs$ cd 04.QualityAssess
(ScSmOP) :~/Spatial/Visium_Mouse_Olfactory_Bulb_fastqs/04.QualityAssess$ cat Spatial_final_stat.tsv
Number of Reads 46,878,299
Reads With Valid Barcodes 45,617,284
Fully_barcode_rate 97.3%
Unique Reads in Spot Mapped to Gene 6,315,431
Estimated Number of Spot 1,049
Total Gene Detected 14,567
Median Gene per Spot 1,540
Median UMI per Spot 3,411
Project description
This library is prepared from universc (Nature Communication) 10× 3.0.0 https://github.com/minoda-lab/universc/tree/master/test/shared/cellranger-tiny-fastq/3.0.0.
The reference genome we selected is tinyref, and library name set to PBMC.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose STAR index has been generated @
~/RefGenome/tinyrefcellrange3/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir PBMC_test
(ScSmOP) :~$ cd PBMC_test
(ScSmOP) :~/PBMC_test$
Download scRNA-seq FASTQ files from universc github repo
# Download the data from github: https://github.com/minoda-lab/universc/tree/master/test/shared/cellranger-tiny-fastq/3.0.0.
There were 2 types of reads: Read 1, containing 16 bp cell barcode (chromium barcode) from its first base pair, followed by a 12 bp UMI; Read 2, containing 91 bp transcript which need to be aligned to reference genome.
(ScSmOP) :~/PBMC_test$ ls
tinygex_S1_L001_I1_001.fastq.gz
tinygex_S1_L001_R1_001.fastq.gz
tinygex_S1_L001_R2_001.fastq.gz
tinygex_S1_L002_I1_001.fastq.gz
tinygex_S1_L002_R1_001.fastq.gz
tinygex_S1_L002_R2_001.fastq.gz
Run pipeline scsmop.sh
(ScSmOP) :~/PBMC_test$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n PBMC_test -1 tinygex_S1_L001_R1_001.fastq.gz,tinygex_S1_L002_R1_001.fastq.gz -2 tinygex_S1_L001_R2_001.fastq.gz,tinygex_S1_L002_R2_001.fastq.gz -r ~/RefGenome/tinyrefcellrange3/ -@ 10
Check finish
(ScSmOP) :~/PBMC_test$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/PBMC_test$ cd 04.QualityAssess
(ScSmOP) :~/PBMC_test/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
Total_read_pairs 461083
Read_pairs_with_full_barcodes 437122
Fully_barcode_rate .948033
Cell_count_at_fastq 11946
Cell_count_estimated 1106
Total_gene_detected 202
Mean_gene_per_cell 21
Median_UMI_per_cell 29
Project description
This library is prepared from universc (Nature Communication) Drop-seq https://github.com/minoda-lab/universc/tree/master/test/shared/dropseq-test.
The reference genome we selected is tinyref, and library name set to Dropseq.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose STAR index has been generated @
~/RefGenome/tinyrefcellrange3/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir Dropseq
(ScSmOP) :~$ cd Dropseq
(ScSmOP) :~/Dropseq$
Download scRNA-seq FASTQ files from universc github repo
# prepare the data as described at https://github.com/minoda-lab/universc/tree/master/test/shared/dropseq-test.
There were 2 types of reads: Read 1, containing 12 bp cell barcode (chromium barcode) from its first base pair, followed by a 8 bp UMI; Read 2, containing variable lenghth of base pair of transcript which need to be aligned to reference genome.
(ScSmOP) :~/Dropseq$ ls
SRR1873277_Sample1_R1.fastq.gz
SRR1873277_Sample1_R2.fastq.gz
Run pipeline scsmop.sh
(ScSmOP) :~/Dropseq$ ~/ScSmOP/scsmop.sh -t dropseq -n Dropseq -1 SRR1873277_Sample1_R1.fastq.gz -2 SRR1873277_Sample1_R2.fastq.gz -r ~/RefGenome/tinyrefcellrange3/ -@ 10
Check finish
(ScSmOP) :~/Dropseq$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/Dropseq$ cd 04.QualityAssess
(ScSmOP) :~/Dropseq/04.QualityAssess$ cat Dropseq_final_stat.tsv
Total_read_pairs 29265
Read_pairs_with_full_barcodes 29265
Fully_barcode_rate 1.000000
Cell_count_at_fastq 6828
Cell_count_estimated 278
Total_gene_detected 148
Mean_gene_per_cell 22
Median_UMI_per_cell 40
Project description
Library was constructed by processing GM12878 cells using 10× Genomics Single Cell Multiome ATAC + Gene Expression kit. Then loaded to Chromium platform to amplify RNA. So, the library is a 10× Genomics Single Cell Gene Expression library but the barcodes are from 10× Genomics Single Cell Multiome ATAC + Gene Expression’s Gene Expression part.
The reference genome we selected is hg38, and library name set to SHG023.
Suppose ScSmOP has been installed If following elements are not available, please install by referencing ScSmOP Github Readme. https://github.com/ZhengmzLab/ScSmOP
Suppose ScSmOP pipeline has been installed @
If downloaded ScSmOP through wget
, the ScSmOP should be changed to ScSmOP-0.1.3 as ~/ScSmOP-0.1.3/
~/ScSmOP/
Suppose STAR index has been generated @
~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/
Make directory for the project
(base) :~$ conda activate ScSmOP
(ScSmOP) :~$ mkdir SHG023
(ScSmOP) :~$ cd SHG023
(ScSmOP) :~/SHG023$
Download scRNA-seq FASTQ files from Google drive
# prepare the data as described at https://github.com/minoda-lab/universc/tree/master/test/shared/dropseq-test.
Read 1, containing 16 bp cell barcode from its first base pair, followed by a 12 bp UMI, 34 bp PolyT tail, 88 bp transcript; Read 2, containing 150 bp transcript which need to be aligned to reference genome; I5, I7 containing sample index to distinguish different samples, but this library just has one sample, so they are not used.
(ScSmOP) :~/SHG023$ ls
SHG203_S1_L004_R1_001.fastq.gz
SHG203_S1_L004_R2_001.fastq.gz
SHG203_S1_L004_I1_001.fastq.gz
SHG203_S1_L004_I2_001.fastq.gz
There are two ways to process such library:
Type 1: Ignore the PloyA tail and 88bp transcripts in Read 1. Then the library will have same read structure as 10× Genomics (V3) Single Cell Gene Expression with different barcode whitelist, then process the library as a scrna_10x_v3 library.
Type 2: DIY a new configuration file specific for the library.
Run pipeline scsmop.sh
(ScSmOP) :~/SHG023$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n SHG023 -1 SHG203_S1_L004_R1_001.fastq.gz -2 SHG203_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -@ 10 -w ~/ScSmOP/BarcodeBucket/737K-arc-v1-scrna.txt
Check finish
(ScSmOP) :~/SHG023$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/SHG023$ cd 04.QualityAssess
(ScSmOP) :~/SHG023/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
Total_read_pairs 69086417
Read_pairs_with_full_barcodes 60926690
Fully_barcode_rate 0.881891
Cell_count_at_fastq 341342
Cell_count_estimated 2702
Total_gene_detected 6207
Mean_gene_per_cell 3
Median_UMI_per_cell 2
A generate DIY procedure of ScSmOP contains 4 steps:
- Prepare FASTQ files.
- Prepare barcode whitelist files.
- Generate custom configuration file.
- Decide experiment type and replace default configuration file with custom configuration file when run
scsmop.sh
.
Edit configuration file
(ScSmOP) :~/SHG023$ cp ~/Work/ScSmOP/ConfigFiles/OriginalConfigFile.json .
(ScSmOP) :~/SHG023$ ln -s ScSmOP/BarcodeBucket/737K-arc-v1-scrna.txt .
(ScSmOP) :~/SHG023$ vi OriginalConfigFile.json
Example procedure of SHG023.
Modifying OriginalConfigFile.json
{
"barcode chain" : [ {"BC-UMI": "R1:1"}, {"GENOMEA|2": "R2:1"}],
"identifier" : [ {"CELL":"BC"} ],
"barcode type" :
{
"BC":
{
"DENSE": 1,
"SPACE":0,
"LAXITY":0,
"LENGTH":16,
"MISMATCH":1,
"WHITE LIST":"737K-arc-v1-scrna.txt"
},
"UMI":
{
"SPACE": 0,
"LAXITY": 0,
"LENGTH": "12",
"MISMATCH": 0,
"WHITE LIST":""
}
}
}
Press Esc
-> shift
+ :
-> wq
-> enter
in your keyboard to exit the edition.
Generate custom configuration file
(ScSmOP) :~/SHG023$ ~/ScSmOP/Tools/python3 ~/ScSmOP/PythonScript/GenerateConfigFile.py -i OriginalConfigFile.json -o SHG023
(ScSmOP) :~/SHG023$ ls
OriginalConfigFile.json
737K-arc-v1-scrna.txt
SHG023_config.json
SHG203_S1_L004_R1_001.fastq.gz
SHG203_S1_L004_R2_001.fastq.gz
SHG203_S1_L004_I1_001.fastq.gz
SHG203_S1_L004_I2_001.fastq.gz
Run scsmop.sh
with custom configuration file
This is still a scRNA-seq library require UMI deduplication and gene annotation, set -t
to "scrna_10x_v3", set -c
to .
(ScSmOP) :~/SHG023$ ~/ScSmOP/scsmop.sh -t scrna_10x_v3 -n SHG023 -1 SHG203_S1_L004_R1_001.fastq.gz -2 SHG203_S1_L004_R2_001.fastq.gz -r ~/RefGenome/refdata-gex-GRCh38-2020-A-STAR/ -@ 10 -c SHG023_config.json
Check finish
(ScSmOP) :~/SHG023$ ls
01.BarcodeIden
BarcodeIdentification.done
02.ReadAlign
SequenceAlignment.done
04.QualityAssess
QualityAssessment.done
Get statistic
(ScSmOP) :~/SHG023$ cd 04.QualityAssess
(ScSmOP) :~/SHG023/04.QualityAssess$ cat PBMC_5k_final_stat.tsv
Total_read_pairs 69086417
Read_pairs_with_full_barcodes 60926690
Fully_barcode_rate 0.881891
Cell_count_at_fastq 341342
Cell_count_estimated 2702
Total_gene_detected 6207
Mean_gene_per_cell 3
Median_UMI_per_cell 2