HyPhy Analyses - a-lud/nf-pipelines GitHub Wiki
In addition to having a pipeline that runs HyPhy
, I've also implemented a pipeline that runs HyPhy-analyses
.
These are pipelines built around the core models implemented in HyPhy
to test for specific evolutionary questions.
More information about the methods can be found here.
This pipeline does not do any post-processing of the results. By default, HyPhy
generates a markdown formatted results log, along
with a JSON results file. Therefore, the results can be aggregated and processed in R/Python.
To run the analysis scripts, the hyphy-analyses GitHub repository needs to be installed, along with
the development
branch of the standard HyPhy tool. These can be installed using the following commands.
# Install hyphy-analyses which contains the pipelines
git clone https://github.com/veg/hyphy-analyses.git hyphy-analyses
# Install the development branch of hyphy
git clone https://github.com/veg/hyphy.git hyphy-develop
cd hyphy-develop
git checkout develop
cmake ./
make -j MP
The current version of the HyPhy
pipeline has the following arguments:
--msa string Directory path to MSA files. Extension must be '.fa' or '.fasta'.
--tree string File path to phylogenetic tree with 'test' branches marked using '{}' notation. E.g. (sample1, sample2{Test});.
--testLabel string What is the branch label ('{}') listed in the tree file.
--batchFile string Which 'HyPhy analysis' batch file to run Options: BUSTED-PH.bf.
--hyphyDev string Absolute directory path to HyPhy 'develop' directory. This has to be installed via git!
--hyphyAnalysis string Absolute directory path to 'hyphy-analyses' directory. This has to be installed via git!
The arguments are explained in more detail below.
The --msa
argument expects a directory path, which points to a directory containing multiple sequence alignments (MSA).
The pipeline will match all files in the directory path that are MSAs (must have extension *.{fa,fasta}
and bank them
for processing.
Nearly all selection testing tools expect a phylogenetic tree file. As such, the --tree
argument expects a filepath to
a tree-file that is in newick format. Some selection tests enable 'marked branches', which can be specified using curly
braces like the the following example:
(sample_1, (sample_2, sample_3{test}));
Here, sample 3 is marked as the test branch. Branch labels can also be specified at nodes.
The --testLabel
argument directly relates to the tree labels mentioned above. This argument expects a string that matches
the tree labels the user has used in their newick tree. If the --testLabel
string differs to the branch-labels is in the --tree
argument, the program will error.
For example, using the tree from above, the following arguments will work
# Arguments
--tree /path/to/tree_above.nwk
--testLabel 'test'
While this will cause an error
# Arguments
--tree /path/to/tree_above.nwk
--testLabel 'Foreground'
There is no Foreground
string in example tree, meaning HyPhy
will not be able to find the test species.
The --batchFile
argument expects the name of the batch-file to use. The batch-files are found in the hyphy-analyses
directory
that was downloaded above. Each file ends with the *.bf
extension. So far I've only implemented support for the BUSTED-PH.bf
file.
This argument (--hyphyDev
) expects the directory path to the installed HyPhy
development repository (see above). The
hyphy-analyses
batch-files will only work with the development branch of the HyPhy
software.
Similarly, the --hyphyAnalysis
argument requires the directory path to the hyphy-analysis
directory that was installed
above. The pipeline searches this directory to obtain the requested batch-file.