Filtering out Low Abundance OTUS Sequence Variants - meyermicrobiolab/Meyer_Lab_Resources GitHub Wiki
[Removing Samples][#removing-samples] Filtering out Low-Abundance
-
Remove OTUs that do not appear more than 1 time in more than half the samples
-
filterlow <- genefilter_sample(ps, filterfun_sample(function(x) x>1),A=0.5*nsamples(ps)) ps1<-prune_taxa(filterlow,ps) ntaxa(ps1)
-
-
Remove samples by sample names ( != means exclude; you could use == for keep samples matching, but you can only have one thing to match)
-
ps2 = subset_samples(ps, sample_names(ps) != "F10" & sample_names(ps) != "G2" & sample_names(ps) != "J10" & sample_names(ps) != "K3" & sample_names(ps) != "K8" & sample_names(ps) != "K4") nsamples(ps2)
-
-
If you want to keep MULTIPLE SAMPLES, separate with "or" (|) instead of "and" (&)
-
ps10raG = subset_samples(ps10ra, Coral == "MC2" | Coral == "OF2")
-
-
Remove samples by metadata ("Coral" is a column name in my metadata, command says EXCLUDE samples matching "Diploria...." and samples matching "Dichocoenia...")
-
ps_MO = subset_samples(ps_nopink, Coral != "Diploria labyrinthiformis" & Coral != "Dichocoenia stokesi" ) nsamples(ps_MO)
-
-
Filter out low abundance otus; only OTUs with a mean relative abundance greater than 10^-5 (0.001%) are kept
-
if you do this, the otu table is now rel. ab. --- can't use in codaseq
ps2 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU)) ps2f <- filter_taxa(ps2, function(x) mean(x) > 1e-5, TRUE) ntaxa(ps2f)
-
-
Filtering used by Bian et al msphere: filter out low abundance otus; only OTUs greater than 0.1% relative abundance in any sample and occurred in at least 20% of samples are kept
-
ps2 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU)) filterlow <- genefilter_sample(ps2, filterfun_sample(function(x) x> 1e-3),A=0.2*nsamples(ps2)) ps3<-prune_taxa(filterlow,ps2)
-
-
filter out taxa with mean read count across all samples >10 ###### this is what worked well on my project
-
ntaxa(ps) ps10<-filter_taxa(ps, function(x) mean(x) >10, TRUE) ntaxa(ps10)
-
-
Filter out taxa with mean read count across all samples >5
-
ps5<-filter_taxa(ps, function(x) mean(x) >5, TRUE) ntaxa(ps5) get_taxa_unique(ps5, "Phylum") get_taxa_unique(ps5, "Order")
-
-
Filtered taxa with phyloseq, now export otu and taxa tables from phyloseq object for input to CoDaSeq
-
otu = as(otu_table(ps5), "matrix") taxon = as(tax_table(ps5), "matrix") metadata = as(sample_data(ps5), "matrix") write.table(otu,"filtered_otu_table_DiseaseOutbreak_gg_nochloromito.txt",sep="\t",col.names=NA) write.table(taxon,"filtered_taxa_table_DiseaseOutbreak_gg_nochloromito.txt",sep="\t",col.names=NA) write.table(metadata,"filtered_metadata.txt",sep="\t",col.names=NA)
-