Phmmer compares a query sequence to a database of protein sequences. The database used to compare sequences to is stored in the server at /ifs/data/glab/uniref90/uniref90.fasta
Use the command phmmer -o output.txt query_protein.fasta /path/to/database which takes in an input amino acid sequence fasta file and returns a hmmer txt file with ranked homologs
B. Filter homologs by Enzyme Commission (EC) number. This makes sure all of your homologs catalyze the same reaction.
The phmmer output only contains the accession numbers, not EC numbers or sequences, so you will need to map the accession number to EC numbers.
Start by making a copy of the phmmer output text file and adding an additional column for ec numbers by mapping accession numbers to the uniref database. [accession_to_ec.py]
Then filter the original phmmer output text file to only include accession numbers that map to the ec number for your protein. [filter_phmmer_ec.py]
C. Convert phmmer text file to fasta file with sequences
Command line instructions for running mafft are available on their website with different options for algorithms. L-insi tends to be faster than E-insi
Example using L-insi and all 128 server threads: mafft --thread 128 --localpair pfk.fasta > pfk.aln