HyPhy Analyses - a-lud/nf-pipelines GitHub Wiki

Introduction

In addition to having a pipeline that runs HyPhy, I've also implemented a pipeline that runs HyPhy-analyses. These are pipelines built around the core models implemented in HyPhy to test for specific evolutionary questions. More information about the methods can be found here.

This pipeline does not do any post-processing of the results. By default, HyPhy generates a markdown formatted results log, along with a JSON results file. Therefore, the results can be aggregated and processed in R/Python.

Set up requirements

To run the analysis scripts, the hyphy-analyses GitHub repository needs to be installed, along with the development branch of the standard HyPhy tool. These can be installed using the following commands.

# Install hyphy-analyses which contains the pipelines
git clone https://github.com/veg/hyphy-analyses.git hyphy-analyses

# Install the development branch of hyphy
git clone https://github.com/veg/hyphy.git hyphy-develop
cd hyphy-develop
git checkout develop
cmake ./
make -j MP

Arguments

The current version of the HyPhy pipeline has the following arguments:

--msa string                 Directory path to MSA files. Extension must be '.fa' or '.fasta'.
--tree string                File path to phylogenetic tree with 'test' branches marked using '{}' notation. E.g. (sample1, sample2{Test});.
--testLabel string           What is the branch label ('{}') listed in the tree file.
--batchFile string           Which 'HyPhy analysis' batch file to run Options: BUSTED-PH.bf.
--hyphyDev string            Absolute directory path to HyPhy 'develop' directory. This has to be installed via git!
--hyphyAnalysis string       Absolute directory path to 'hyphy-analyses' directory. This has to be installed via git!

The arguments are explained in more detail below.

Argument overview

MSA

The --msa argument expects a directory path, which points to a directory containing multiple sequence alignments (MSA). The pipeline will match all files in the directory path that are MSAs (must have extension *.{fa,fasta} and bank them for processing.

Tree

Nearly all selection testing tools expect a phylogenetic tree file. As such, the --tree argument expects a filepath to a tree-file that is in newick format. Some selection tests enable 'marked branches', which can be specified using curly braces like the the following example:

(sample_1, (sample_2, sample_3{test}));

Here, sample 3 is marked as the test branch. Branch labels can also be specified at nodes.

Test label

The --testLabel argument directly relates to the tree labels mentioned above. This argument expects a string that matches the tree labels the user has used in their newick tree. If the --testLabel string differs to the branch-labels is in the --tree argument, the program will error.

For example, using the tree from above, the following arguments will work

# Arguments
--tree /path/to/tree_above.nwk
--testLabel 'test'

While this will cause an error

# Arguments
--tree /path/to/tree_above.nwk
--testLabel 'Foreground'

There is no Foreground string in example tree, meaning HyPhy will not be able to find the test species.

Batch file

The --batchFile argument expects the name of the batch-file to use. The batch-files are found in the hyphy-analyses directory that was downloaded above. Each file ends with the *.bf extension. So far I've only implemented support for the BUSTED-PH.bf file.

HyPhy dev

This argument (--hyphyDev) expects the directory path to the installed HyPhy development repository (see above). The hyphy-analyses batch-files will only work with the development branch of the HyPhy software.

HyPhy Analysis

Similarly, the --hyphyAnalysis argument requires the directory path to the hyphy-analysis directory that was installed above. The pipeline searches this directory to obtain the requested batch-file.

⚠️ **GitHub.com Fallback** ⚠️