Preprocessing Pipelines - bvshvarf/bvshvarf.github.io GitHub Wiki
Author(s): Bryan Hawickhorst
Overview
DataSeq provides automated preprocessing pipelines for three types of genomic data: RNA-seq, ATAC-seq, and WGBS
Each pipeline cleans raw sequencing data and prepares it for downstream analysis by following standardized steps.
Available Pipelines
RNA-seq Pipeline
- Quality control (QC) on raw reads
- Adapter trimming based on QC results
- Alignment to a reference genome
- Generation of read count tables
ATAC-seq Pipeline
- Quality control (QC) on raw reads
- Adapter trimming
- Alignment to a reference genome
- Peak calling to identify transposase-accessible chromatin regions
WGBS Pipeline
- Quality control (QC) on bisulfite-treated reads
- Adapter trimming
- Alignment to a bisulfite-converted reference genome
- Methylation calling
User Parameters
When configuring a preprocessing job, users will be asked to provide values for specific parameters related to their dataset and desired processing settings.
Quality Control (QC)
Quality control is performed as a separate first step for all uploaded samples.
- After QC, a MultiQC report is generated summarizing data quality.
- Users can review the report to decide if they want to adjust parameters for subsequent steps (especially trimming).
- QC processing does not require manual intervention after submission.
How Preprocessing Works
- After uploading files, choosing a data type, and setting parameters on the Pre-Processing Page, DataSeq automatically runs the pipeline.
- Processing is managed by a secure high-performance computing cluster.
- Users can monitor job progress through the Logging page.
- Processed files are available for download once jobs complete.
Outputs
Each pipeline produces:
- Cleaned sequencing reads
- Quality control reports (e.g., FastQC, MultiQC)
- Processed data files (e.g., read count matrices, peak files, methylation calls)
These outputs can be downloaded through the Downloads page.
Notes
- QC uses the same scripts across all data types.
- Preprocessing steps (e.g., trimming, alignment) are customized based on data type and user-provided parameters.
- All backend workflow management is handled automatically and does not require user action.
- Detailed logs are available for troubleshooting if a job fails.