Dependencies and Install - CDCgov/phoenix GitHub Wiki
-
Install
Nextflow
(>=21.10.3,<24.01.0
).There are several options for install if you do not already have it on your system:
-
Install into conda environment, which will require a version of Anaconda to be installed on your system.
mamba create -n nextflow -c bioconda nextflow=21.10.6
-
If you prefer a to use
curl
orwget
for install see the Nextflow Documentaiton
-
-
Install
Docker
orSingularity >=3.8.0
for full pipeline reproducibility. -
Download kraken database that is required for the kraken2 subworkflow of PHoeNIx.
-
For PHoeNIx >=1.1.1 you will need to download the public Standard-8 version kraken2 database created on May 17, 2021 from Ben Langmead's github page. The download link is https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz.
-
For PHoeNIx >=2.0.0 you will need to download the public Standard-8 version kraken2 database created on or after March 14th, 2023 from Ben Langmead's github page. You CANNOT use an older version of the public kraken databases on Ben Langmead's github page. We thank @BenLangmead and @jenniferlu717 for taking the time to include an extra file in public kraken databases created after March 14th, 2023 to allow them to work in PHoeNIx!
-
(optional) If you installed nextflow via a conda environment activate the nextflow environment with:
conda activate nextflow
-
Run PHoeNIx on a test sample loaded with the package with a single command:
nextflow run cdcgov/phoenix -r v1.0.0 -profile <singularity/docker/custom>,test -entry PHOENIX --kraken2db $PATH_TO_DB
Note that this command clones (downloads) the repo to ~/.nextflow/assets/cdcgov/phoenix
. See below for how to clone and have the software downloaded to a different location.
> * The pipeline comes with config profiles called `docker` and `singularity` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`.
> * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
-
Start running your own analysis with a samplesheet!
nextflow run cdcgov/phoenix -r v1.0.0 -profile <singularity/docker/custom> -entry PHOENIX --input <path_to_samplesheet.csv> --kraken2db $PATH_TO_DB
1. Install Nextflow
(>=21.10.3
).
There are several options for install if you do not already have it on your system:
-
Use
curl
orwget
for install see the Nextflow Documentation -
A good way to install Nextflow is with conda or mamba. Mamba is much faster so we would recommend that. This will require installation of Anaconda first. A short tutorial on Anaconda and its set up can be found [here] (https://jvhagey.github.io/Tutorials/mydoc_Installation.html).
If you need mamba installed and you already have anaconda on your system run:
conda install -c conda-forge mamba
To install Nextflow run:
mamba create -n nextflow -c conda-forge -c bioconda nextflow=21.10.6
Then you can activate the environment with:
conda activate nextflow
- You will run PHoeNIx from inside this environment!
Configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile. You can chain multiple config profiles in a comma-separated string.
- The pipeline comes with config profiles called
docker
andsingularity
which instruct the pipeline to use the named tool for software management. For example,-profile test,docker
. - Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. - If you are using
singularity
and are persistently observing issues downloading Singularity images directly due to timeout or network issues, you can use the--singularity_pull_docker_container
parameter to pull and convert the Docker image instead. Alternatively, you can use thenf-core download
command to download images first, before running the pipeline. Setting theNXF_SINGULARITY_CACHEDIR
orsingularity.cacheDir
Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.- To add
NXF_SINGULARITY_CACHEDIR
to your bash profile run the following:- Open your
~/.bash_profile
by runningnano ~/.bash_profile
or some other text editor that isn't nano. - Inside the
~/.bash_profile
add the following lines
export NXF_SINGULARITY_CACHEDIR=/$PATH/Singularity_Containers export PATH
- Here $PATH is the full path to where you want to store the folder. You can name
Singularity_Containers
folder whatever you want. You will need to restart your terminal or runsource ~/.bash_profile
to allow nextflow to see the new path.
- Open your
- To add
Install Docker
or Singularity
-
For PHoeNIx <=1.1.1 you will need to download the public Standard-8 version kraken2 database created on May 17, 2021 from Ben Langmead's github page. The download link is https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz.
-
For PHoeNIx >=2.0.0 you will need to download the public Standard-8 version kraken2 database created on or after March 14th, 2023 from Ben Langmead's github page. You CANNOT use an older version of the public kraken databases on Ben Langmead's github page. We thank @BenLangmead and @jenniferlu717 for taking the time to include an extra file in public kraken databases created after March 14th, 2023 to allow them to work in PHoeNIx!
To run PHoeNIx there are two options the difference being where you want it installed:
-
Install the latest version via cloning PHoeNIx github repo into a folder of your choosing:
cd $PATH_TO_INSTALL git clone https://github.com/CDCgov/phoenix
If you want to run a particular version then you can download that using the
-b
argument like this:git clone -b v1.0.0 https://github.com/CDCgov/phoenix
Then you can run it (make sure you activate your conda environment first, if that is how nextflow is installed!):
nextflow run $PATH_TO_INSTALL/phoenix/main.nf -entry PHOENIX -profile <singularity/docker/custom> --input <path_to_samplesheet.csv> --kraken2db $PATH_TO_DB
-
Alternatively, PHoeNIx run directly (will download to
~/.nextflow/assests/cdcgov/phoenix
):nextflow run cdcgov/phoenix -r v1.0.0 -entry PHOENIX -profile <singularity/docker/custom> --input <path_to_samplesheet.csv> --kraken2db $PATH_TO_DB
Running PHoeNIx this way means it will just pull the version specified with
-r
on github to run and it will be installed into~/.nextflow/assets/cdcgov/phoenix
.
To test that the pipeline was installed and configured correctly run the following by running either:
nextflow run phoenix/main.nf -profile test,<singularity/docker/custom> -entry PHOENIX --kraken2db $PATH_TO_DB
or
nextflow run cdcgov/phoenix -r v1.0.0 -profile test,<singularity/docker/custom> -entry PHOENIX --kraken2db $PATH_TO_DB
This command will run the pipeline on preloaded data. If all goes well you should see some output that looks like this:
As you can see from the screenshot this takes ~18mins to run 🐢. This is because the test is limited to 2 cpus. If you want to speed it up into 🐇 mode go into phoenix/conf/test.config and increase the max_cpus
parameter and save the file before running. Notice there are some steps that aren't run in this pipeline, specifically some SPADES_WF stats, these will only run if you have a sample fail SPAdes where contigs are created, but not scaffolds, so the this behavior is normal.