Project_B - ncbi/workshop-asm-ngs-2022 GitHub Wiki
There is more than one way to retrieve SRA data, and depending on needs you may want to use one of the methods below.
By default, if you provide the toolkit an accession, it will find the "closest" copy of the record to you and retrieve it.
fasterq-dump --split-3 --fasta --skip-technical --seq-defline '>$ac.$si_$ri' -f SRR12379805
First we will find all the places the data exists.
wget -q "https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve?acc=SRR15515762&accept-alternate-locations=yes" -O - | jq -r '.result[].files[].locations[] | [.service, .region, .link] | @tsv' | column -t
wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR15515762/SRR15515762 -O SRR15515762
fasterq-dump --split-3 --fasta --seq-defline '>$ac.$si_$ri' -f ./SRR15515762
efetch -id NC_045512 -db nuccore -format fasta > NC_045512.2.fa
vim illumina.sh
Press i
to begin editing the file, then, after pasting the code, esc
, then ctrl+:
, then x
to save and exit vim.
#!/bin/bash -eu
acc=$1
ref=$(realpath $2)
shift 2
fastas=$@
minimap2 -a -x sr $ref $fastas 2>minimap2.log | samtools view -Sb - | samtools sort - > $acc.bam
samtools depth $acc.bam > $acc.depth
Then enter chmod u+x illumina.sh
vim cov_plot.sh
Press i
to begin editing the file, then, after pasting the code, esc
, then ctrl+:
, then x
to save and exit vim.
#!/bin/bash -eu
depth=$1
width=79
height=24
reverse=noreverse
terminal=dumb
plotwith=dots
tmp=$(mktemp)
cat $depth | cut -f 2,3 > $tmp
ext=$(mktemp)
awk 'NR > 1 { if ($2>10 && n>10 && ($2+$2<n || n+n<$2)) { print line; ; } } { line=$0; n=$2 }' $tmp > $ext
VAR=$(cat <<- EOM
set term $terminal $width $height;
set xlabel 'Position';
set ylabel 'Coverage (bases)';
set xrange [] $reverse;
set style line 1 lc rgb '#8B1A0E' pt 0 ps 1 lt 1 lw 2;
set style line 2 lc rgb '#5E9C36' pt 0 ps 1 lt 1 lw 2;
set tics nomirror;
plot '$tmp' with lp ls 1 title '$depth';
EOM
)
gnuplot -p -e "$VAR"; # 2>/dev/null
rm $tmp $ext
Then enter chmod u+x cov_plot.sh
Does the coverage look good for both, should we exclude some records from downstream analysis due to low coverage?
./illumina.sh SRR12379805 NC_045512.2.fa SRR12379805_*.fasta
./cov_plot.sh SRR12379805.depth