1.6 SpliceScape: Splicing Analysis with MAJIQ - labbces/SpliceScape GitHub Wiki

Running in parallel with SGSeq, the pipeline also uses** MAJIQ (Modeling Alternative Junction Inclusion Quantification)** to identify and quantify splicing events. As recommended by recent benchmarks, MAJIQ is included for its high precision, which complements SGSeq's high recall.

The MAJIQ analysis is divided into two separate Nextflow processes:

  1. MAJIQ_SETTING: This process prepares the necessary configuration, builds the splice graph, and quantifies the events (PSI calculation).
  2. MAJIQ_RUN: This process takes the quantified results and categorizes them into specific, easy-to-interpret event types.

Process 7: MAJIQ_SETTING

This multi-step process orchestrates the initial stages of the MAJIQ workflow.

Step 1.1: Creating the Settings File

Before running MAJIQ, a specific .ini configuration file is required for each sample. This is handled by the majiq_settings_file_creator.py script.

Script Details:

  • Function: Generates a .ini file that tells MAJIQ where to find the BAM file and the reference genome assembly.

  • Arguments:

--output_dic: The directory where the .ini file will be saved.

--species, --sra: Names used for creating a unique filename.

--bam_dir: Path to the directory containing the BAM file.

--assembly: Path to the genome assembly directory.

--output_star: The filename prefix used by STAR during mapping.

Step 1.2: Building the Splice Graph (majiq build)

This command is the core of MAJIQ's detection step. It parses the genome annotation (.gff3) and the read alignments (.bam) to define a "splice graph" for each gene. This graph represents all known and novel exons and splice junctions discovered in the sample.

Step 1.3: Quantifying Events (majiq psi)

This command takes the splice graph (.majiq file) from the build step and quantifies the relative inclusion of each splice junction. It calculates the Percent Spliced-In (PSI) value for each event, which represents the proportion of transcripts that include a specific alternative exon or junction.

  • Inputs & Outputs (MAJIQ_SETTING Process)
Type Description
Input The tuple from the mapping step containing the BAM file path, its index, and the SRA accession.
Output A tuple containing the paths to all intermediate MAJIQ files (.ini, .psi.tsv, .psi.voila, splicegraph.sql) and the SRA accession, ready for the next process.

Process 8: MAJIQ_RUN

This final process takes the quantified results from majiq psi and makes them more interpretable using the voila tool.

Step 2.1: Categorizing Events (voila modulize)

The voila modulize command analyzes the PSI values and the splice graph to classify local splicing variations into canonical event types, such as:

  • Cassette exons (skipped exons)
  • Alternative 5' or 3' splice sites
  • Mutually exclusive exons
  • And others

It outputs the results into separate .tsv files for each event type, making it easy to find and analyze specific kinds of splicing variations.

  • Inputs & Outputs (MAJIQ_RUN Process)
Type Description
Input The tuple of MAJIQ files generated by the MAJIQ_SETTING process.
Output A directory containing multiple .tsv files, each named after a specific splicing event type (e.g., cassette.tsv, alt5prime.tsv). These are published as symbolic links to the final MAJIQ_results directory.