HomologousVirusSequenceSearchandMultipleSequenceAlignment - BGIGPD/BestPractices4Pathogenomics GitHub Wiki

Homologous Virus Sequence Search and Multiple Sequence Alignment

Purpose of Homologous Virus Sequence Search

  • Study the evolutionary relationships of viruses.

Homologous Virus Sequence Search Control

  • Use NCBI BLAST for homologous virus sequence search: NCBI BLAST

Downloading Viral Sequences

  • Download sequences to local computer or use wget or ncbi-datasets-cli to download to the server.

Searching for Viral Sequences

Processing Viral Sequences

  • Unzip and filter viral sequences:
gunzip -c virushostdb.formatted.cds.faa.gz > virushostdb.formatted.cds.faa
seqkit grep -n -r -p "Severe acute respiratory syndrome-related coronavirus" virushostdb.formatted.cds.faa > SARS-CoV-2.faa

Multiple Sequence Alignment

Install MAFFT using conda:

conda install -c bioconda mafft

Run MAFFT for sequence alignment:

mafft --auto SARS-CoV-2.faa > SARS-CoV-2.aln.faa