TaxaPlot.R Tutorial - sciencesharon/MicrobialSeq GitHub Wiki

TaxaPlot.R Tutorial

This tutorial can follow usage of Merged_Abundance_Processing.R or can be used with any relative abundance table in the format of Samples as columns and Taxa as rows.

Example Input Files

If starting from Merged_Abundance_Processing.R:

If starting from a relative abundance table:

Step 1: Process Merged Abundance File with Merged_Abundance_Processing.R

1. Read the merged abundance file into R

merged <- read.table("/path/to/file/merged_abundance_example.txt", header = FALSE, sep = "\t")

2. Set an output directory

output_dir <- ("/path/to/file/output")

3. Utilize merged_abundance_processing.R script

relabun_list <- merged_abundance_processing(merged, output_dir)

Step 2: Load metadata

1. Read metadata into R

met <- read.csv("/path/to/file/metadata_example.csv")

met <- tibble::column_to_rownames(met, var = "X")

Step 3: Subset merged abundance output (optional)

1. Subset the metadata and make the metadata groups by which to facet_wrap

  • change this to however you want to subset for your example data

samples <- met %>% filter(str_detect(SampleID, regex("\\bSample", ignore_case = TRUE)))

2. Extract sample ID's for the metadata group

subset_samples <- samples$SampleID

3. Subset the merged abundance output using subset_by_colnames_and_save (if working from merged abundance table)

new_dir <- ("/path/to/file/output/subset")

subset_relabun_list <- subset_by_colnames_and_save(relabun_list, subset_samples, prefix = "Subset", output_dir = new_dir)

Step 4: Subset most abundant taxa using mostAbundant

Genus_otu_most = mostAbundant(Subset_Genus_RelAbun, N = 15, items = NULL, others = TRUE, rescale = TRUE)

Species_otu_most = mostAbundant(Subset_Species_RelAbun, N = 15, items = NULL, others = TRUE, rescale = TRUE)

Step 5: Create metagroups for plots

1. Set metagroups by which to facet_wrap the plots

sample_metagroup <- list('All_Samples' = subset_samples)

2. If you have multiple groups to define, define the samples in each group, then the metagroup

five_samples <- subset(samples, Timepoint == "5_month")

five_samples <- five_samples$SampleID

two_samples <- subset(samples, Timepoint == "2_week")

two_samples <- two_samples$SampleID

time_samples <- (c(two_samples, five_samples))

timepoint_metagroup <- list('5_month' = five_samples, '2_week' = two_samples)

3. If your metagroups are a subset, make sure to subset the relative abundance as well

subset_genus <- Genus_otu_most[,subset_samples]

subset_species <- Species_otu_most[,subset_samples]

timepoint_genus <- Genus_otu_most[,time_samples]

timepoint_species <- Species_otu_most[,time_samples]

Step 5: Set colors and labels for plot

color_taxa_gen <- color_taxa(Genus_otu_most)

color_taxa_spec <- color_taxa(Species_otu_most)

label_fill_gen <- rownames(Genus_otu_most)

label_fill_spec <- rownames(Species_otu_most)

Step 6: Create and save taxa plot

1. Create plot

  • change height and width of the pdf as needed
  • data: relative abundance table
  • label_y: y label
  • color: colors for taxa
  • label_fill: labels for taxa
  • base_size: text size
  • nrow & ncol: number of columns and rows for facet_wrap
  • metadata_groups: groups by which to facet_wrap

pdf("/path/to/file/status.genus.pdf", width = 8, height = 5) #change the height and width of the pdf as you desire

taxa_plot(data = timepoint_genus, label_y = "Relative Abundance", color = color_taxa_gen, label_fill = label_fill_gen , base_size = 12, nrow = 1,ncol = 2,metadata_groups = timepoint_metagroup)

dev.off()

2. Save the data used in the plot

write.csv(timepoint_genus, "/path/to/file/timepoint.genus.csv")

Example Output Files

status.genus.pdf

timepoint.genus.csv

Plotting more taxa and setting unique colors

Step 1: Subset species by top 30 most abundant using mostAbundant

Species_otu_most = mostAbundant(Subset_Species_RelAbun, N = 30, items = NULL, others = TRUE, rescale = TRUE)

Step 2: Convert rownames into column

Species_otu_most <- Species_otu_most %>% rownames_to_column(var = "Species")

Step 3: Create custom order of taxa as I'd like them to appear in the plot

custom_order <- c("Other", "Bifidobacterium bifidum", "Bifidobacterium breve", "Bifidobacterium dentium", "Bifidobacterium longum", "Bifidobacterium pseudocatenulatum", "Klebsiella michiganensis", "Klebsiella oxytoca", "Klebsiella pneumoniae", "Klebsiella variicola", "Escherichia coli", "Veillonella parvula", "Veillonella ratti", "Veillonella rogosae", "Bacteroides fragilis", "Bacteroides ovatus", "Citrobacter freundii", "Citrobacter sp RHBSTW 00671", "Phocaeicola dorei", "Phocaeicola vulgatus", "Streptococcus salivarius", "Parabacteroides distasonis", "Parabacteroides merdae", "Staphylococcus epidermidis", "Erysipelatoclostridium ramosum", "Enterococcus faecalis", "Clostridium butyricum", "Hungatella hathewayi", "Megasphaera sp MJR8396C", "Ruminococcus gnavus", "Ruminococcus torques")

Step 4: Reorder the taxa by custom order

Species_otu_most <- Species_otu_most[match(custom_order, Species_otu_most$Species), ]

Step 5: Reset the rownames as taxa

rownames(Species_otu_most) = NULL

Species_otu_most <- Species_otu_most %>% column_to_rownames(var = "Species")

Step 6: Set custom colors for taxa list

color_taxa_spec <- c("Other" = "#bdbcbc", "Bifidobacterium bifidum" = "#85A1EF", "Bifidobacterium breve" = "#B7D8E8", "Bifidobacterium dentium" = "#7EC8E3", "Bifidobacterium longum" = "#0F52BA", "Bifidobacterium pseudocatenulatum" = "#00FFFF", "Klebsiella michiganensis" = "#b2d689", "Klebsiella oxytoca" = "#B4C424", "Klebsiella pneumoniae" = "#7CFC00", "Klebsiella variicola" = "#DFFF00", "Escherichia coli" = '#FFEA00', "Veillonella parvula" = "#f7766d", "Veillonella ratti" = "#b54e4b", "Veillonella rogosae"= "#732728", "Bacteroides fragilis" = "#196619", "Bacteroides ovatus" = "#123D12", "Citrobacter freundii" = "#D4FBE7", "Citrobacter sp RHBSTW 00671"= "#75D7A2", "Phocaeicola dorei" = "#0000FF", "Phocaeicola vulgatus" = "#00008B", "Streptococcus salivarius" = "#f5bcbc", "Parabacteroides distasonis" = "#AB6666", "Parabacteroides merdae" = "#F5DEB3", "Staphylococcus epidermidis"= "#f59a40", "Erysipelatoclostridium ramosum"= "#c8b1d5", "Enterococcus faecalis" = "#D22B2B", "Clostridium butyricum" = "#FF7518", "Hungatella hathewayi"= "#FF10F0", "Megasphaera sp MJR8396C"= "#9F2B68", "Ruminococcus gnavus" ="#5D3FD3", "Ruminococcus torques" = "#800080")

Step 7: Set labels for plot

label_fill_spec <- rownames(Species_otu_most)

Step 8: Subset relative abundance table as needed

timepoint_species <- Species_otu_most[,time_samples]

Step 9: Create and save the plot

pdf("/path/to/file/timepoint.species_extended.pdf", width = 12, height = 5) #change the height and width as needed

taxa_plot(timepoint_species, label_y = "Relative Abundance", color = color_taxa_spec, label_fill = label_fill_spec, base_size = 12, nrow = 1, ncol = 2, metadata_groups = timepoint_metagroup)

dev.off()

Step 10: Save the data used to make the plot

write.csv(timepoint_species, "/path/to/file/timepoint.species_extended.csv")

Example Output

timepoint.species_extended.pdf

timepoint.species_extended.csv

⚠️ **GitHub.com Fallback** ⚠️