Quality_Assessment - MorrellLAB/sequence_handling GitHub Wiki

Basic Usage

Before beginning sequence_handling, make sure that your FastQ samples have been merged (if individual samples are split across multiple files) and renamed. It will be much harder to merge and/or rename files later in the pipeline.

The Quality_Assessment handler runs FastQC on a list of FastQ, SAM, or BAM samples. FastQC can process any type of FastQ encoding and can handle sample inputs that are gzipped or bzipped. Running Quality_Assessment will produce a HTML document for each sample and a summary file for all samples containing metrics on the sequence quality, sequence length distribution, sequence duplication levels, adapter content, and other quality statistics. For more information on these metrics, view the FastQC documentation.

To run Quality_Assessment, all common and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Quality_Assessment can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling):

./sequence_handling Quality_Assessment Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
QA_QSUB QSub settings for batch submission. Recommended settings are "mem=1gb,nodes=1:ppn=4,walltime=6:00:00".
QA_SAMPLES The list of FastQ, SAM, or BAM samples to be processed, which can be generated using sample_list_generator.sh. This should be a plain text file with one file path per line.
TARGET The size of the region that was sequenced in base pairs. For whole-genome sequencing, this is the genome size. For exome capture, this is the size of the capture region. If you do not have this information, put "NA".

Output

Quality_Assessment will output a HTML and a zip file for each sample in your raw sample list using FastQC. To view the HTML files, open them using your favorite web browser.

After Quality_Assessment has completed, a tab-delimited text file and a plots png file will also be generated that summarize the quality statistics for each sample. The full file path to these files will be

${OUT_DIR}/Quality_Assessment/${PROJECT}_quality_summary.txt
${OUT_DIR}/Quality_Assessment/${PROJECT}_quality_plots.png

where ${OUT_DIR} and ${PROJECT} are specified in the configuration file.

Dependencies

Quality_Assessment depends on FastQC, Riss-util, PBS, and GNU Parallel to run. All of these are available through MSI. For those not on MSI, please download and install these separately or modify the script to work with your tools. Please check the dependencies page to ensure that you are using the required version of each dependency.

Next: Adapter_Trimming