Filtering out Low Abundance OTUS Sequence Variants - meyermicrobiolab/Meyer_Lab_Resources GitHub Wiki

Skip to:

[Removing Samples][#removing-samples] Filtering out Low-Abundance

Removing samples

IF you need to remove samples, do that first, then filter out low abundance OTUs

  • Remove OTUs that do not appear more than 1 time in more than half the samples

    •     filterlow <- genefilter_sample(ps, filterfun_sample(function(x) x>1),A=0.5*nsamples(ps))
          ps1<-prune_taxa(filterlow,ps)
          ntaxa(ps1)
  • Remove samples by sample names ( != means exclude; you could use == for keep samples matching, but you can only have one thing to match)

    •     ps2 = subset_samples(ps, sample_names(ps) != "F10" & sample_names(ps) != "G2" & sample_names(ps) != "J10" &
          sample_names(ps) != "K3" & sample_names(ps) != "K8" & sample_names(ps) != "K4")
          nsamples(ps2)
  • If you want to keep MULTIPLE SAMPLES, separate with "or" (|) instead of "and" (&)

    •     ps10raG = subset_samples(ps10ra, Coral == "MC2" | Coral == "OF2")
  • Remove samples by metadata ("Coral" is a column name in my metadata, command says EXCLUDE samples matching "Diploria...." and samples matching "Dichocoenia...")

    •     ps_MO = subset_samples(ps_nopink, Coral != "Diploria labyrinthiformis" & Coral != "Dichocoenia stokesi" )
          nsamples(ps_MO)

Filtering out Low Abundance

  • Filter out low abundance otus; only OTUs with a mean relative abundance greater than 10^-5 (0.001%) are kept

    •     ps2 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
          ps2f <- filter_taxa(ps2, function(x) mean(x) > 1e-5, TRUE)
          ntaxa(ps2f)
      if you do this, the otu table is now rel. ab. --- can't use in codaseq
  • Filtering used by Bian et al msphere: filter out low abundance otus; only OTUs greater than 0.1% relative abundance in any sample and occurred in at least 20% of samples are kept

    •     ps2 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
          filterlow <- genefilter_sample(ps2, filterfun_sample(function(x) x> 1e-3),A=0.2*nsamples(ps2))
          ps3<-prune_taxa(filterlow,ps2)
  • filter out taxa with mean read count across all samples >10 ###### this is what worked well on my project

    •     ntaxa(ps)
          ps10<-filter_taxa(ps, function(x) mean(x) >10, TRUE)
          ntaxa(ps10)
  • Filter out taxa with mean read count across all samples >5

    •     ps5<-filter_taxa(ps, function(x) mean(x) >5, TRUE)
          ntaxa(ps5)
          get_taxa_unique(ps5, "Phylum")
          get_taxa_unique(ps5, "Order")
  • Filtered taxa with phyloseq, now export otu and taxa tables from phyloseq object for input to CoDaSeq

    •     otu = as(otu_table(ps5), "matrix")
          taxon = as(tax_table(ps5), "matrix")
          metadata = as(sample_data(ps5), "matrix")
          write.table(otu,"filtered_otu_table_DiseaseOutbreak_gg_nochloromito.txt",sep="\t",col.names=NA)
          write.table(taxon,"filtered_taxa_table_DiseaseOutbreak_gg_nochloromito.txt",sep="\t",col.names=NA)
          write.table(metadata,"filtered_metadata.txt",sep="\t",col.names=NA)
      

Now you can import the filtered otu table, taxa table, and updated metadata files into the CoDaSeq pipeline for analysis.

⚠️ **GitHub.com Fallback** ⚠️