FastQ Screen - NBISweden/workshop-genome_assembly GitHub Wiki

Fastq Screen: Reference screen against known sequences (potential contaminants, frequently sequenced organisms)

Notes:

  • Needs research time to find the correct references to include.
  • Dependencies: BWA

Command:

#!/usr/bin/env bash

PATH="$PATH:/path/to/fastqscreen"
JOB=$SLURM_ARRAY_TASK_ID

DATA_DIR=/path/to/reads
FILES=( $DATA_DIR/*_R1.fastq.gz )

FASTQ="${FILES[$JOB]}"
fastq_screen "$FASTQ" "${FASTQ/_R1./_R2.}"

Setup: fastqscreen.conf

BWA /sw/apps/bioinfo/bwa/0.7.17/milou/bin/bwa
THREADS		8
DATABASE	Human	References/GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz
##
## Ecoli- sequence available from EMBL accession U00096.2
DATABASE	Ecoli	References/Ecoli/U00096.fasta.gz
##
## PhiX - sequence available from Refseq accession NC_001422.1
DATABASE	PhiX	References/PhiX/NC_001422.fasta.gz
##
## Adapters - sequence derived from the FastQC contaminants file found at: www.bioinformatics.babraham.ac.uk/projects/fastqc
DATABASE	Adapters	References/Contaminants/contaminant_list.fasta.gz
##
## Vector - Sequence taken from the UniVec database
## http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html
DATABASE	Vectors		References/Vectors/UniVec.gz