SRA queries - ababaian/serratus GitHub Wiki
Serratus SRA Seach Queries
These are the SRA search queries used to generate the sraRunInfo.csv
files which have been analyzed in Serratus. This is a de-facto manifest of the total search space.
Total Unique Runs: 7 675 502
s3://lovelywater2/sra/
├ v201210/ # Query sets from major version v210225 and prior
├ v220113/ # Query sets from major version v210225
└ v230116_SraRunInfo.csv # master query CSV for v230116 ***
v230110
Major Version Date Accessed: 2022 12 07
.
Results from each query below was de-duplicated and added to a complete manifest of SraRunInfo proccesed in prior version.
Mammalian
((("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties])) AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date])
Vertebrate
"Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date])
Invertebrate
("Metazoa"[Organism] NOT "Vertebrata"[Organism]) ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date])
Eukaryotes
("Eukaryota"[Organism] NOT "Metazoa"[Organism]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date])
Metagenomic/Metatranscriptomic
"METAGENOME" OR "METATRANSCRIPTOME" OR "metatranscriptomic"[Filter] OR "metagenomic"[Filter] NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop] AND ("2022/01/01"[Publication Date] : "2023/01/01"[Publication Date])
Human (non-controlled)
"Homo sapiens"[Organism] AND ("type_rnaseq"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2021/01/01"[Publication Date] : "2023/01/01"[Publication Date])
Mouse
"Mus musculus"[Organism] AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties] AND ("2021/01/01"[Publication Date] : "2023/01/01"[Publication Date])
Virome
"VIRAL METAGENOME" OR "VIROME" OR "VIROMIC" OR "VIRAL METAGENOMICS" NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]
Prokaryotes
("type_rnaseq"[Filter] OR "RNASEQ") NOT "Eukaryota"[Organism] AND cluster_public[prop] AND "platform illumina"[Properties] AND ("2021/01/01"[Publication Date] : "2023/01/01"[Publication Date])
v220113
Minor Version Quality Control update 1; duplications from v210225
were depleted
Date Accessed: 2022 01 13
- Mammalian
196,870
("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]
- Vertebrate
138,073
("Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
- Invertebrate
244,249
("Metazoa"[Organism] NOT "Vertebrata"[Organism]) ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
- Eukaryotes
468,867
("Eukaryota"[Organism] NOT "Metazoa"[Organism]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
- Metagenomic/Metatranscriptomic
751,512
"METAGENOME" OR "METATRANSCRIPTOME" OR "metatranscriptomic"[Filter] OR "metagenomic"[Filter] NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]
v210225
Major Version 5 780 800
Total Unique: Human (non-controlled data)
- File:
hu_SraRunInfo.csv
Search Term:
"Homo sapiens"[Organism] AND ("type_rnaseq"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/12/30
Results: 789 931
AND
- File:
hu_meta_SraRunInfo.csv
Search Term:
"Homo sapiens"[Organism] AND ("metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/12/30
Results: 47 763
Mouse
- File:
mu_SraRunInfo.csv
Search Term:
"Mus musculus"[Organism] AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]
SRA Accessed: 2020/12/30
Results: 1 058 559
Mammalian
- File:
mamm_SraRunInfo.csv
Search Term:
("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]
SRA Accessed: 2020/12/30
Results: 126 382
Vertebrate
- File:
vert_SraRunInfo.csv
Search Term:
("Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/12/30
Results: 114 078
Invertebrate
- File:
inv_SraRunInfo.csv
Search Term:
("Metazoa"[Organism] NOT "Vertebrata"[Organism]) ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/12/31
Results: 184 729
Eukarya
- File:
euk_SraRunInfo.csv
Search Term:
("Eukaryota"[Organism] NOT "Metazoa"[Organism]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2021/01/01
Results: 367 924
Prokaya / Other
- File:
pro_SraRunInfo.csv
Search Term:
("type_rnaseq"[Filter] OR "RNASEQ" OR "metagenomic"[Filter] OR "METAGENOME" OR "metatranscriptomic"[Filter] OR "METATRANSCRIPTOME" OR "ENVIRONMENTAL") NOT "Eukaryota"[Organism] AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2021/01/20
Results: 2 672 802
Virome
- File:
viro_SraRunInfo.csv
Search Term:
"VIRAL METAGENOME" OR "VIROME" OR "VIROMIC" OR "VIRAL RNA" OR "VIRAL METAGENOMICS" NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]
SRA Accessed: 2020/12/31
Results: 52 072
Metagenomic
- File:
meta_SraRunInfo.csv
Search Term:
"METAGENOMIC" NOT amplicon[All Fields] AND "platform illumina"[Properties] AND cluster_public[prop]
SRA Accessed: 2020/12/31
Results: 566 826
Bat (with WGS)
- File:
bat_SraRunInfo.csv
Search Term:
"bat" OR txid9397[Organism:exp] AND "platform illumina"[Properties]
SRA Accessed: 2021/01/01
Results: 14 103
WGS (mammalian, non-mouse)
- File:
wgs_SraRunInfo.csv
Search Term:
("Mammalia"[Organism] NOT "Mus musculus"[orgn]) AND ("type_exome"[Filter] OR "type_genome"[Filter]) NOT "METAGENOME" AND "platform illumina"[Properties] AND cluster_public[prop]
SRA Accessed: 2021/01/21
Results: 393 278
v201210
Major Version These SraRunInfo files are deprecated, they were used in the initial 'nucleotide' search. Can be accessed in bucket file history for archiving.
Human (non-controlled/dbGAP data)
- File:
hu_SraRunInfo.csv
Search Term:
"txid9606"[Organism:exp] AND ("type_rnaseq"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/05/30
Results: 672 656
- File:
hu_meta_SraRunInfo.csv
Search Term:
"txid9606"[Organism:exp] AND ("metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/05/30
Results: 36 103
Mouse
- File:
mu_SraRunInfo.csv
Search Term:
("Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]
SRA Accessed: 2020/06/06
Results: 890 746
Mammalian
- File:
mamm_SraRunInfo.csv
Search Term:
("Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND "platform illumina"[Properties]
SRA Accessed: 2020/06/06
Results: 100 798
Vertebrate
- File:
vert_SraRunInfo.csv
Search Term:
("Vertebrata"[Organism] NOT "Mammalia"[Organism] NOT "Homo sapiens"[Organism] NOT "Mus musculus"[orgn]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/05/25
Results: 94 908
Invertebrate
- File:
inv_SraRunInfo.csv
Search Term:
("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) NOT "Vertebrata"[Organism] AND cluster_public[prop] AND "platform illumina"[Properties]
SRA Accessed: 2020/06/12
Results: 2 193 740
Bat (update + wgs)
- File: bat_SraRunInfo.csv
Search Term:
txid9397[Organism:exp] AND "platform illumina"[Properties]
SRA Accessed: 2020/06/20
Results: 2 823
Virome
- File:
viro_SraRunInfo.csv
Search Term:
"virome" OR "viral metagenome" OR "viral metagenomics" AND cluster_public[prop]
SRA Accessed: 2020/06/12
Results: 22 251
Metagenomes
- File: meta_SraRunInfo.csv
Search Term:
txid256318[Organism:noexp]
SRA Accessed: 2020/07/12
Results: 163 659
Single-Cell RNA-seq
Note: This is a 'negative' filter, this data-set was not analyzed it is used to separate single-cell RNA-seq from other RNA-seq libraries in downstream analysis.
- File:
scRNA_SraRunInfo.csv
Search Term:
"single-cell" OR "single cell" OR "scRNA-seq" OR "scRNAseq" OR "scRNAseq"
SRA Accessed: 2020/06/22
Results: 770 828