Brief description of scripts - LangilleLab/microbiome_helper GitHub Wiki

biom_to_stamp.py: STAMP has built-in BIOM conversion, but depending on where the BIOM file comes from there can be slight format problems with STAMP. Specifically, this script handles PICRUSt BIOM output (both KOs and KEGG Pathways) and QIIME 16S OTU tables (with metadata 'taxonomy').
chimera_filter.pl: Wraps VSEARCH (which implements the UCHIME reference-based algorithm) to filter out chimeric reads from a directory of reads in FASTA format.
chimera_filter_usearch61.pl: Wraps USEARCH v6.1 (which implements the UCHIME reference-based algorithm) to filter out chimeric reads from a directory of reads in FASTA format.
concat_lanes.pl: Concatenates FASTQs or FASTAs from multiple lanes together.
concat_paired_end.pl: Concatenate paired end FASTQs or FASTAs.
convert_dada2_out.R: Rscript to convert DADA2 R-objects to BIOM and FASTA files.
count_fastq.pl: Takes in path to FASTQ files and returns a table of counts of the number of reads and average read length per FASTQ.
create_qiime_map.pl: Creates a qiime map file based on a list of input filenames that can then be used with the script "add_qiime_labels.py".
dada2_chimera_taxa.R: Rscript to run chimera checking and taxonomy assignment with DADA2.
dada2_filter.R: Rscript to filter amplicon reads with DADA2.
dada2_inference.R: Rscript to run DADA2's main inference algorithm.
filter_fastq.pl: Simple script to filter out reads based on a length cut-off.
fix_ITS2_spf.py: Script used to make fix in ITS2 annotation to be compatibile with STAMP.
humann_to_stamp.pl: Converts HUMAnN1 output to STAMP format (currently removes extra rows and renames samples ids so they are the same as the original file)
merge_logfiles.R: Rscript to merge logfiles that are all in same format with first column of samples (used in our DADA2 workflow).
metaphlan2_stamp_to_biom.py: Script to convert MetaPhlAn2 output from STAMP to BIOM format.
metaphlan_to_stamp.pl: This is script that converts a merged MetaPhlAn output file that was created using all taxnomic ranks to a STAMP profile file.
parse_cutadapt_logs.py: Script to parse cutadapt logfiles and output 1 table of read counts.
plot_metagenome_contributions.R: Rscript that can make useful plots for interpreting the output of PICRUSt's metagenome_contributions.py script.
qiime2_fastq_lengths.py: Creates summary table of number of reads in FASTQ files within QIIME2 QZA files.
read_filter.pl: Wraps several read filtering commands together (using FASTX Toolkit and BBMap) to run on a directory of fastq files.
remove_low_confidence_otus.py: Filters a BIOM table to remove low confidence OTUs that result from MiSeq run-to-run bleed-through (based on 0.1% as reported by Illumina).
run_contaminant_filter.pl: Wraps the Bowtie2 program to filter out human reads from metagenomic data (faster than deconseq)
run_deconseq.pl: Wraps the deconseq program to filter out human reads from metagenomic data
run_fastq_to_fasta.pl: Wraps the fastq_to_fasta command from the FASTX Toolkit to allow the use of multiple threads.
run_kraken.pl: Wrapper to run kraken to do taxonomic classification of metagenomic reads along with the post-processing tools kraken-translate and kraken-mpa-report.
run_metaphlan.pl: Wraps the MetaPhlAn package and handles running multiple samples at once as well as handling paired end data a bit more cleaner. It also runs each sample in parallel and merges the results into a single output file. Also, easily allows gzipped or non-gzipped files.
run_metaphlan2.pl: Wraps the MetaPhlAn2 package using the same method as for MetaPhlAn.
run_pear.pl: Makes running the PEAR program easier on many samples, by automatically identifying the paired-end files to merge together.
run_pre_humann.pl: Does similarity search against the KEGG database using search tool diamond using multiple threads. This output is then fed into HUMAnN1.
run_sra_to_fastq.pl: Converts SRA files to FASTQ format.
run_trimmomatic.pl: Wrapper for Trimmomatic to filter reads by length and quality.