deltaBS across a population of bacteria - Gardner-BinfLab/deltaBS GitHub Wiki

To run deltaBS for a large-scale comparison:

  • Assemble your genomes
  • Annotate them with Prokka (or equivalent) and put protein-coding sequences in .faa format in a folder called 'annotations'
  • Identify orthologous groups with Roary or Panaroo (Roary will incorrectly split orthologs and introduce artifacts in the analysis so Panaroo is recommended) to produce a gene_presence_absence.csv file.
  • Run hmmsearch on all assemblies:

For each assembly $INFILE: hmmsearch --domtblout search/${INFILE}.search models.hmm annotations/${INFILE}.faa

Then parse the results: ./parse_bitscores.pl gene_presence_absence.csv search

This will give you a tab-delimited file of bitscores for analysis. You can use this as direct input for a machine learning algorithm. The script also prints a file indicating the HMM that was used to score each orthologous group, in case you need to check that the model is comparing the orthologous group to an appropriate set of sequences.

The signal you will be looking for in this file is a gene that has a change in distribution of scores associated with a change in lifestyle:

deltaBS table