SRA queries - ababaian/serratus GitHub Wiki

Serratus SRA Seach Queries

These are the SRA search queries used to generate the sraRunInfo.csv files which have been analyzed in Serratus. This is a de-facto manifest of the total search space.

Total Unique Runs: 7 675 502

s3://lovelywater2/sra/
├ v201210/               # Query sets from major version v210225 and prior
├ v220113/               # Query sets from major version v210225
└ v230116_SraRunInfo.csv # master query CSV for v230116                          ***

Major Version v230110

Date Accessed: 2022 12 07.

Results from each query below was de-duplicated and added to a complete manifest of SraRunInfo proccesed in prior version.

Mammalian

((("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties])) AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date]) 

Vertebrate

"Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date]) 

Invertebrate

("Metazoa"[Organism] NOT "Vertebrata"[Organism]) ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date]) 

Eukaryotes

("Eukaryota"[Organism] NOT "Metazoa"[Organism]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date])

Metagenomic/Metatranscriptomic

"METAGENOME" OR "METATRANSCRIPTOME" OR "metatranscriptomic"[Filter] OR "metagenomic"[Filter] NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date])

Human (non-controlled)

"Homo sapiens"[Organism] AND ("type_rnaseq"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2021/01/01"[Publication Date] : "2023/01/01"[Publication Date])

Mouse

"Mus musculus"[Organism] AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties] AND ("2021/01/01"[Publication Date] : "2023/01/01"[Publication Date])

Virome

"VIRAL METAGENOME" OR "VIROME" OR "VIROMIC" OR "VIRAL METAGENOMICS" NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]

Prokaryotes

("type_rnaseq"[Filter] OR "RNASEQ") NOT "Eukaryota"[Organism] AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2021/01/01"[Publication Date] : "2023/01/01"[Publication Date])

Minor Version v220113

Quality Control update 1; duplications from v210225 were depleted

Date Accessed: 2022 01 13

  • Mammalian 196,870
("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]
  • Vertebrate 138,073
("Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
  • Invertebrate 244,249
("Metazoa"[Organism] NOT "Vertebrata"[Organism]) ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
  • Eukaryotes 468,867
("Eukaryota"[Organism] NOT "Metazoa"[Organism]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] 
  • Metagenomic/Metatranscriptomic 751,512
"METAGENOME" OR "METATRANSCRIPTOME" OR "metatranscriptomic"[Filter] OR "metagenomic"[Filter] NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]

Major Version v210225

Total Unique: 5 780 800

Human (non-controlled data)

  • File: hu_SraRunInfo.csv

Search Term:

"Homo sapiens"[Organism] AND ("type_rnaseq"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/12/30

Results: 789 931

AND

  • File: hu_meta_SraRunInfo.csv

Search Term:

"Homo sapiens"[Organism] AND ("metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/12/30

Results: 47 763

Mouse

  • File: mu_SraRunInfo.csv

Search Term:

"Mus musculus"[Organism] AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]

SRA Accessed: 2020/12/30

Results: 1 058 559

Mammalian

  • File: mamm_SraRunInfo.csv

Search Term:

("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]

SRA Accessed: 2020/12/30

Results: 126 382

Vertebrate

  • File: vert_SraRunInfo.csv

Search Term:

("Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/12/30

Results: 114 078

Invertebrate

  • File: inv_SraRunInfo.csv

Search Term:

("Metazoa"[Organism] NOT "Vertebrata"[Organism]) ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/12/31

Results: 184 729

Eukarya

  • File: euk_SraRunInfo.csv

Search Term:

("Eukaryota"[Organism] NOT "Metazoa"[Organism]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2021/01/01

Results: 367 924

Prokaya / Other

  • File: pro_SraRunInfo.csv

Search Term:

("type_rnaseq"[Filter] OR "RNASEQ" OR "metagenomic"[Filter] OR "METAGENOME" OR "metatranscriptomic"[Filter] OR "METATRANSCRIPTOME" OR "ENVIRONMENTAL") NOT "Eukaryota"[Organism] AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2021/01/20

Results: 2 672 802

Virome

  • File: viro_SraRunInfo.csv

Search Term:

"VIRAL METAGENOME" OR "VIROME" OR "VIROMIC" OR "VIRAL RNA" OR "VIRAL METAGENOMICS" NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]

SRA Accessed: 2020/12/31

Results: 52 072

Metagenomic

  • File: meta_SraRunInfo.csv

Search Term:

"METAGENOMIC" NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]

SRA Accessed: 2020/12/31

Results: 566 826

Bat (with WGS)

  • File: bat_SraRunInfo.csv

Search Term:

"bat" OR txid9397[Organism:exp] AND "platform illumina"[Properties]

SRA Accessed: 2021/01/01

Results: 14 103

WGS (mammalian, non-mouse)

  • File: wgs_SraRunInfo.csv

Search Term:

("Mammalia"[Organism] NOT "Mus musculus"[orgn]) AND ("type_exome"[Filter] OR "type_genome"[Filter]) NOT "METAGENOME" AND "platform illumina"[Properties] AND cluster_public[prop]

SRA Accessed: 2021/01/21

Results: 393 278

Major Version v201210

These SraRunInfo files are deprecated, they were used in the initial 'nucleotide' search. Can be accessed in bucket file history for archiving.

Human (non-controlled/dbGAP data)

  • File: hu_SraRunInfo.csv

Search Term:

"txid9606"[Organism:exp] AND ("type_rnaseq"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/05/30

Results: 672 656

  • File: hu_meta_SraRunInfo.csv

Search Term:

"txid9606"[Organism:exp] AND ("metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/05/30

Results: 36 103

Mouse

  • File: mu_SraRunInfo.csv

Search Term:

("Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]

SRA Accessed: 2020/06/06

Results: 890 746

Mammalian

  • File: mamm_SraRunInfo.csv

Search Term:

("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]

SRA Accessed: 2020/06/06

Results: 100 798

Vertebrate

  • File: vert_SraRunInfo.csv

Search Term:

("Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/05/25

Results: 94 908

Invertebrate

  • File: inv_SraRunInfo.csv

Search Term:

("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter])  NOT "Vertebrata"[Organism] AND cluster_public[prop] AND "platform illumina"[Properties]

SRA Accessed: 2020/06/12

Results: 2 193 740

Bat (update + wgs)

  • File: bat_SraRunInfo.csv

Search Term:

txid9397[Organism:exp] AND "platform illumina"[Properties]

SRA Accessed: 2020/06/20

Results: 2 823

Virome

  • File: viro_SraRunInfo.csv

Search Term:

"virome" OR "viral metagenome" OR "viral metagenomics" AND cluster_public[prop]

SRA Accessed: 2020/06/12

Results: 22 251

Metagenomes

  • File: meta_SraRunInfo.csv

Search Term:

txid256318[Organism:noexp]

SRA Accessed: 2020/07/12

Results: 163 659

Single-Cell RNA-seq

Note: This is a 'negative' filter, this data-set was not analyzed it is used to separate single-cell RNA-seq from other RNA-seq libraries in downstream analysis.

  • File: scRNA_SraRunInfo.csv

Search Term:

"single-cell" OR "single cell" OR "scRNA-seq" OR "scRNAseq" OR "scRNAseq"

SRA Accessed: 2020/06/22

Results: 770 828