Tips on downloading genomes from NCBI - 18liedan/genomics_memo GitHub Wiki

Last updated: April 9th, 2025

download NCBI genomes via command line

schematic diagram:

https://www.ncbi.nlm.nih.gov/datasets/docs/v2/datasets_schema_taxonomy.svg

  1. NCBI datasets: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/command-line-tools/download-and-install/
conda create -n ncbi_datasets
conda activate ncbi_datasets
conda install -c conda-forge ncbi-datasets-cli

for example, download Agano:

datasets download genome accession GCA_030162315.1
  1. SRA toolkit
wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.2.0/sratoolkit.3.2.0-centos_linux64.tar.gz #switch to ubuntu
tar xzvf sratoolkit.3.2.0-centos_linux64.tar.gz
export PATH=$PATH:/lustre7/home/liedan/sratoolkit.3.2.0-centos_linux64/bin

command for downloading:

#for just one SRA, do the following:
fasterq-dump --split-files ${accession} -e 16 -p

#-e: threads, -p: progress bar

This takes a lot of time, and a lot of temporary space during download, so it is best to do this while your storage space is still quite free, and also use nohup to avoid interruptions.