Home - PaleovirologyLab/hi-fever GitHub Wiki
Welcome to the hi-fever wiki! HI-FEVER is a Nextflow pipeline for finding endogenous viral elements (EVEs) in host genomes. It aims to address common issues in paleovirology including cross-matches between host proteins and EVEs, computational burden of EVE searches and incompatability between software packages or platforms. We provide HI-FEVER as an accessible and informative workflow for any EVE-discovery project.
Features
- Protein-to-DNA based search allows detection of divergent and ancient EVEs
- Designed to function with millions of input query proteins
- Reconstructs the predicted EVE protein based on its closest modern match
- Harnesses parallelisation to optimise compute resources
- Scales from laptop to cluster
- Conda and Docker compatible
- LINUX, Windows and MAC compatible
HI-FEVER provides a variety of output information about candidate EVEs, suited to many downstream purposes. Outputs include:
- Genomic coordinates of candidate EVEs
- Closest matches in the reciprocal databases, including full taxonomical information
- Predicted EVE protein sequences and cDNA (frameshift and premature STOP codon aware), with extension beyond original hit
- Extracted nucleotide sequence of each candidate EVE and flanking host genome sequence
- Metadata & statistics of the genome assemblies screened
Acknowledgements
HI-FEVER is based on the following libraries and programs directory along with their license:
- Biopython (https://biopython.org/)
- Seqtk (https://github.com/lh3/seqtk)
- DIAMOND (https://github.com/bbuchfink/diamond)
- BBmap (https://github.com/BioInfoTools/BBMap)
- BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
- Entrez (https://www.ncbi.nlm.nih.gov/Web/Search/entrezfs.html)
- MMSeqs2 (https://github.com/soedinglab/MMseqs2)
- Nextflow (https://www.nextflow.io/)
- Python3 (https://www.python.org/)
- Wise2 (https://www.ebi.ac.uk/~birney/wise2/)
- Seqkit (https://bioinf.shenwei.me/seqkit/)
- Bedtools (https://bedtools.readthedocs.io/en/latest/index.html)