Install - Oshlack/MINTIE GitHub Wiki
The easiest way to install MINTIE is to first install Conda.
Due to MINTIE's large number of dependencies, we recommend using mamba for installation as it will make everything faster. You can install it via:
conda install -c conda-forge mamba
Now install MINTIE:
mamba create -c conda-forge -c bioconda -n mintie mintie
conda activate mintie
This will create a new conda environment with MINTIE and its dependencies. This is recommended to avoid conflicts with other tools and their packages, unless you plan on running on a cluster, in which case, install the tool into the base environment.
If you can't use mamba (it requires installing into conda base, which you may not have permission to do), you can install MINTIE the regular conda way:
conda create -c conda-forge -c bioconda -n mintie mintie
conda activate mintie
You can also just install like this if you don't care about virtual environments:
conda install mintie
You can then run the MINTIE wrapper script like so:
mintie -h
Set up references for hg38 automatically like so:
mintie -r
If you would like to run MINTIE on a cluster, make sure you've installed MINTIE into a base conda environment. MINTIE uses bpipe to run its pipeline logic, which has built-in support for multiple resource manages (see here). Please change the executor
option in the bpipe.config
file to the resource manager you are using, as indicated in bpipe's documentation. You can find your bpipe.config
under $MINTIEDIR
, to find this, run:
mintie -h
If you are using SLURM, please add the following option to your bpipe.config
:
useLegacyTorqueJobPolling=true
Additionally, please ensure that the module for conda containing your MINTIE install is specified in your bpipe.config
. For example:
modules="anaconda3"
One very important parameter to set correctly in the bpipe.config
file is the concurrency
option. This determines the number of processes the pipeline spawns at any one time. Note that this include the processes requested by each job, i.e. a single job requiring 8 procs will count as 8 towards the concurrency cap. Be careful to not set this too high, as running the pipeline with many cases and controls could overwhelm your server.
You can also install MINTIE in the old, labour intensive manual way, if you don't wish to use conda.
Before downloading and running the installation script, please ensure you have R v3.2+ and Python 3.7+ installed. We recommend that Python 3 is installed via Anaconda.
Download the latest release, then run:
tar -xvzf MINTIE-v0.3.0.tar.gz
cd MINTIE-v0.3.0
chmod u+x install_linux64.sh
./install_linux64.sh
./setup_references_hg38.sh
Some software may need to be installed manually if the automated installer fails.
If the installation has succeeded, the generated tools.groovy
file will look similar to this:
// Path to tools used by the MINTIE pipeline
bpipe="<basedir>/MINTIE/tools/bin/bpipe"
fastuniq="<basedir>/MINTIE/tools/bin/fastuniq"
dedupe="<basedir>/MINTIE/tools/bin/dedupe"
trimmomatic="<basedir>/MINTIE/tools/bin/trimmomatic"
fasta_formatter="<basedir>/MINTIE/tools/bin/fasta_formatter"
samtools="<basedir>/MINTIE/tools/bin/samtools"
bedtools="<basedir>/MINTIE/tools/bin/bedtools"
soapdenovotrans="<basedir>/MINTIE/tools/bin/soapdenovotrans"
salmon="<basedir>/MINTIE/tools/bin/salmon"
hisat="<basedir>/MINTIE/tools/bin/hisat"
gmap="<basedir>/MINTIE/tools/bin/gmap"
R="/usr/bin/R"
python=“python"
Please ensure all variables have a correct path if experiencing issues with running the tool. Note that gmap_build
is expected to be under the same directory as gmap
.
You can quickly set up the references required if you're running hg38:
./setup_references_hg38.sh
In case you are running a different reference, you will manually have to create a references.groovy
file in the MINTIE base directory with the following variables:
// Path to references used by the MINTIE pipeline
gmap_refdir="<basedir>/MINTIE/ref/"
genome_fasta="<basedir>/MINTIE/ref/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa"
trans_fasta="<basedir>/MINTIE/ref/Homo_sapiens.GRCh38.cdna.all.fa"
tx_annotation="<basedir>/MINTIE/ref/chess2.2.gtf"
ann_info="<basedir>/MINTIE/ref/chess2.2.info"
gmap_refdir="<basedir>/MINTIE/ref"
gmap_genome="gmap_genome"
tx2gene="<basedir>/MINTIE/ref/tx2gene.txt"
Make sure all paths are correct in this file. Aside from downloaded references, MINTIE needs to set up a specific exon reference, so go ahead and run this script in the MINTIE directory (this script is run automatically when running ./setup_references_hg38.sh
):
python util/make_exon_reference.py ref/chess2.2.gtf
A transcript-to-gene reference will also be required to estimate variant allele frequencies (VAFs) of the novel variant contigs. This is also calculated automatically with the ./setup_references_hg38.sh
, but can be run manually as follows:
python util/make_tx2gene_lookup.py ref/chess2.2.gtf > ref/tx2gene.txt
Note: The transcript IDs in the tx2gene.txt
file, must match the IDs used in the transcript reference fasta file. Alternatively to generating a lookup from the GTF file, a fasta file may also be used.
python util/make_tx2gene_lookup.py ref/Homo_sapiens.GRCh38.cdna.all.fa > ref/tx2gene.txt
Note that the fasta file records will have to contain the 'gene_symbol' property like so:
>ENST00000448914.1 cdna:known chromosome:GRCh38:14:22449113:22449125:1 gene:ENSG00000228985.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD3 description:T cell receptor delta diversity 3 [Source:HGNC Symbol;Acc:HGNC:12256]
If not running the reference script, you will have to build a gmap reference of the genome fasta like so:
tools/bin/gmap_build -s chrom -k 15 -d gmap_genome -D ref ref/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa
NOTE: if you want to change the GTF reference, in addition to regenerating the info and tx2gene files, you will have to delete the reference caches (they are in the base directory from which you run the pipeline from, and are in the form <name>.pickle
).
A reference transcriptome fasta will be generated automatically from the CHESS 2.2 reference GTF and hg38_p8.fasta. To generate this file from your GTF reference and genome of choice, run (you will need gffread):
gffread chess2.2.gtf -g hg38.fa -w chess2.2.fa
Ensure that the genome fasta and your GTF's chromosome names match exactly. We recommend you remove any alternative contigs from both your GTF and fasta.
NOTE: due to one instance of hard-coding, in case you aren't using hg38, make sure you turn off splice motif checking with splice_motif_mismatch=4
.
MINTIE uses bpipe to run its pipeline logic, which has built-in support for multiple resource manages (see here). Please change the executor
option in the bpipe.config
file to the resource manager you are using, as indicated in bpipe's documentation.
Additionally, please ensure that modules for java 1.8, bedtools and bedops are specified in your bpipe.config
. For example:
modules="java bedtools bedops"
If your tools.groovy
is also expecting any specific tools to be in the path, make sure that these are specified here as well.
MINTIE can also be run using other assemblers such as rnaSPAdes and Trinity. Currently, this is only supported if you install manually. If you would like to install rnaSPAdes via the install script, change the commands line in install_linux64.sh
to:
commands="bpipe fastuniq dedupe trimmomatic fasta_formatter samtools bedtools rnaspades salmon hisat gmap"
If you would like to run Trinity, change the line to:
commands="bpipe fastuniq dedupe trimmomatic fasta_formatter samtools bedtools jellyfish bowtie2 Trinity salmon hisat gmap"
Now execute the install_linux64.sh
file.
Alternatively, you may wish to run with a pre-constructured assembly, using any other assembler of choice that produces a fasta file. To do this, set the assemblyFasta
parameter in the params.txt
file to the desired assembly. You can also invoke this when running MINTIE:
@<MINTIE_PATH>/tools/bin/bpipe run @<MINTIE_PATH>/params.txt -p assemblyFasta=<assembly_path> <MINTIE_PATH>/MINTIE.groovy $cases $controls
Samtool's installation may fail if you do not have the GNU curses library and/or the Zlib compression library installed. Please refer to the Samtools installation documentation for full installation instructions. If you are on a cluster environment and samtools is loaded via a module, ensure that samtools is listed in your modules in your bpipe.config
and you have set samtools in your tools.groovy
as:
samtools="samtools"
Python's requirements are installed via the installation script by running:
pip install -r requirements.txt
If this step fails, then python packages will have to be manually installed. Numpy and Pandas are best installed using Anaconda. For the other dependencies, please refer to each package's respective documentation for troubleshooting:
- pysam: (requires dependencies from htslib to be installed.)
- pybedtools
- biopython
- intervaltree
Similarly with the R requirements, these are installed by running
Rscript install_R_dependencies.R
Refer to the respective tools' documentation for troubleshooting: