FastQ Screen - NBISweden/workshop-genome_assembly GitHub Wiki
Fastq Screen: Reference screen against known sequences (potential contaminants, frequently sequenced organisms)
Notes:
- Needs research time to find the correct references to include.
- Dependencies: BWA
Command:
#!/usr/bin/env bash
PATH="$PATH:/path/to/fastqscreen"
JOB=$SLURM_ARRAY_TASK_ID
DATA_DIR=/path/to/reads
FILES=( $DATA_DIR/*_R1.fastq.gz )
FASTQ="${FILES[$JOB]}"
fastq_screen "$FASTQ" "${FASTQ/_R1./_R2.}"
Setup: fastqscreen.conf
BWA /sw/apps/bioinfo/bwa/0.7.17/milou/bin/bwa
THREADS 8
DATABASE Human References/GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz
##
## Ecoli- sequence available from EMBL accession U00096.2
DATABASE Ecoli References/Ecoli/U00096.fasta.gz
##
## PhiX - sequence available from Refseq accession NC_001422.1
DATABASE PhiX References/PhiX/NC_001422.fasta.gz
##
## Adapters - sequence derived from the FastQC contaminants file found at: www.bioinformatics.babraham.ac.uk/projects/fastqc
DATABASE Adapters References/Contaminants/contaminant_list.fasta.gz
##
## Vector - Sequence taken from the UniVec database
## http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html
DATABASE Vectors References/Vectors/UniVec.gz