Sunflower_RNAseq (Workflow) - emilyyaklich/Sunflower_RNAseq Wiki

Download data and organize repository

Forked and cloned the Sunflower_RNAseq repository to: /home/ely67071/

Main directory where analysis will be stored: ${sunflower_data} = /scratch/ely67071/sunflower_data/

  • Raw RNA seq data copied to: ${sunflower_data}/raw_rna_seq/
    • The raw_rna_seq is divided up into 3 subdirectories, each containing 4 runs. Each run contains ~384 paired end sequences.

Download sunflower genome data from https://www.heliagene.org/ICSG/ (requires login info):

  • Ha412HOv2.0-20181130: ${sunflower_data}/genomes/Ha412HOv2.0-20181130/
  • H. argophyllus: ${sunflower_data}/genomes/H_argophyllus/
  • H. annus: ${sunflower_data}/genomes/H_annus/

Quality Assessment

Results located: ${sunflower_data}/rna_fastqc/rna_fastqc_pre_adapter_trimming/

  • Directory structure is the same as in raw_rna_seq: 3 subdirectories, 4 runs per subdirectory...etc.

Troubleshooting

  • Ran out of memory with 1GB (slurm-14110929.out) - switch to 5GB and only 55% of memory was utilized during the job...3GB is probably enough.

Adapter Trimming

Troubleshooting

  • Email notification (--mail-user + --mail-type) is not supported by array jobs in Slurm
    • My workaround for this was to add a line to Trimm.sh which will output each slurm JOBID in the array to a text file (defined in Config)
    • After all the adapter trimming jobs have finished running, run grab_exit_codes.sh which will extract the Exit Code for each job that was run and make note if any codes are non-zero
  • Ran out of memory for some jobs with 10GB, switched to 13GB for all jobs