Tips on downloading genomes from NCBI - 18liedan/genomics_memo GitHub Wiki
Last updated: April 9th, 2025
download NCBI genomes via command line
schematic diagram:
https://www.ncbi.nlm.nih.gov/datasets/docs/v2/datasets_schema_taxonomy.svg
- NCBI datasets: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/command-line-tools/download-and-install/
conda create -n ncbi_datasets
conda activate ncbi_datasets
conda install -c conda-forge ncbi-datasets-cli
for example, download Agano:
datasets download genome accession GCA_030162315.1
- SRA toolkit
wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.2.0/sratoolkit.3.2.0-centos_linux64.tar.gz #switch to ubuntu
tar xzvf sratoolkit.3.2.0-centos_linux64.tar.gz
export PATH=$PATH:/lustre7/home/liedan/sratoolkit.3.2.0-centos_linux64/bin
command for downloading:
#for just one SRA, do the following:
fasterq-dump --split-files ${accession} -e 16 -p
#-e: threads, -p: progress bar
This takes a lot of time, and a lot of temporary space during download, so it is best to do this while your storage space is still quite free, and also use nohup to avoid interruptions.