6. Data visualisation - VascoElbrecht/JAMP GitHub Wiki

Sequences_lost()

This function is ued between modules to indicate how many sequences are discarded in each processing step (in red). When plotting relative sequence proportions, the imput files are used. If you want to plot the relative abundance of reads, in respect to the raw sequence data, set rel=F and devide the Reads_in and Reads_inout by the inital number of raw reads.

  • Reads_in are the number of reads in each sample, before the respecitive module is applied. If you just want to plot the reas remaining after the module was applied, set this to NA.
  • Reads_out are the number of sequences remaining after the module was applied.
  • Set rel=Tif you want to plot relative proportions in percentage
  • The figure title can be defined with mainand oumitted by setting main=""
  • If a filename is given in out the plot will be saved as a PDF under that file name, other wise it will be plotted in R.

Length_distribution()

Function for plotting the length distribution of any fasta or fastq file. Does not work on wrapped fasta files!

  • sequFile provide the path to the fasta or fastq file here. The sequence format should be automatically detected, but can also be provided with fastq=T.
  • col colors used for the read abundances are provided here as a vector.
  • maxL=600 length of the plot, depends on the number of cycles / readlingth of the used sequencer. Usually 500 or 600 is used for Illumina sequencing.
  • If a filename is given in out the plot will be saved as a PDF under that file name, other wise it will be plotted in R.

OTU_heatmap()

This function can be used to read the JAMP OTU tables and visualize them as heatmaps. Instead of a .csv-file you can also use data.frames directly, as long as the OTU IDs are present as row.names() and all values in the table are numeric. The function will convert the read counts to relative abundance and highlight reads with 100-0.001% abundance with a color gradient. The color gradient is base on a log10 scale.

  • Heatmaps can be saved as PDF files if the name is given in out. If no name is given the heatmap is returned as a plot within R.
  • If abundance=T the absolute read counts for each OTU and samples are plotted. They can also be converted to relative abundance using rel=T. Also, text can be omitted for entries where zero reads where detected, with plot0=F. by default, no read counts are plotted.
  • The gradient color can be customised using col=c("blue3", "white"). Change e.g. the "Purple" to "Orange" (figure above), or use a light gray to stronger differentiate between 0 and low abundant reads, e.g. c( "Red", "gray95").

Denoise_barplot()

Plot the distribution of haplotypes within each OTU as a barplot. This plot is generated automatically as part of the Denoise() function (in the _stats folder). However, you can also use the Denoise_barplot() function and the haplotype table e.g. E_haplo_table.csv to further customize the plot.

  • Use table to import a standard haplotype table csv or directly supply a data.frame with OTU names in the first column.
  • By default, samples of a respective OTU are plotted as white cells in the barplot, indicating no haplotypes were detected in x number of samples. If you would like to omit the empty OTUs and only plot the relative proportions of OTUs set emptyOTUs=T to F. If turned off, no axis labels will be plotted, as the barplots do not indicate the actual distribution of OTUs.
  • Specify the name of the PDF to save in out, e.g. out="MyPlot.pdf". The dimensions of the PDF can be adjusted with height=6 and width=7. If out is left on NA the plot is returned within R.
  • To control the number of plots per row and line use mfrow=c(5, 40).