Testing your own isolates against our model - Gardner-BinfLab/invasive_salmonella GitHub Wiki

There are two approaches you can use to run the model on your own isolates, located in the directories fastq_pipeline and assembly_pipeline. The fastq pipeline is recommended because the choice of assembly method can impact the results.

All of the code you will need to run this analysis is located in the appropriate directory. The pipelines require the following to run:

  • HMMER3
  • bowtie2
  • bcftools
  • samtools
  • fastaq (only required for the assembly pipeline) (pip3 install pyfastaq)
  • BBMap (can be installed with Bioconda)
  • R v. 4.1 or above
  • R package randomForest
  • R package ROCR (conda install -c bioconda r-rocr)
  • R package DMwR, can be installed using: library(devtools); remotes::install_github("cran/DMwR")

The fastq pipeline assumes a suffix of _1.fastq and _2.fastq for paired-end reads for each sample.

To run the pipeline on multiple samples, create a file with the path to your read files, excluding the suffixes.

cat samples.txt | while read i; do ./run_invasiveness_index.sh $i; done

Where samples.txt may contain:

path_to_samples/sampleID1
path_to_samples/sampleID2
path_to_samples/sampleID3