TUSCO quick start - ConesaLab/SQANTI3 GitHub Wiki

TUSCO Quick Start Guide (SQANTI3 QC)

Overview

TUSCO (Transcriptome Universal Single‑isoform Control) provides a compact, curated set of single‑isoform genes to benchmark transcriptome reconstruction. When enabled in SQANTI3 QC, it produces a self‑contained HTML report summarizing detection performance and per‑gene visualizations (Gviz‑based, IGV‑like tracks). This guide specifies inputs, execution, outputs, and interpretation to support rigorous and reproducible use.


Inputs and Prerequisites

  • Isoform annotations: provide a transcript GTF via --isoforms for optimal plotting.
    • If starting from FASTA/FASTQ, SQANTI3 generates a corrected GTF internally, which is then used by TUSCO.
  • Reference resources: --refGTF (gene annotation) and --refFasta (genome assembly) are required.
  • Reporting: ensure --report is not skipped; TUSCO runs alongside the standard SQANTI3 QC report.

Basic Usage

python sqanti3_qc.py \
  --isoforms your_transcripts.gtf \
  --refGTF reference.gtf \
  --refFasta genome.fasta \
  --tusco human|mouse \
  --report html \
  -o <prefix> -d <outdir>

Key flag:

  • --tusco: selects the bundled species panel (human or mouse).

Outputs

  • (<outdir>)/(<prefix>)_TUSCO_report.html — self‑contained HTML benchmarking report.
  • (<outdir>)/logs/tusco_report.log — Rscript log for the TUSCO report generation.
  • (<outdir>)/igv_plots/ — one PNG per TUSCO gene with Gviz tracks and SQANTI3 category coloring.

The TUSCO report is optional. If it fails, the main QC pipeline continues. Consult tusco_report.log for diagnostics.


Bundled Reference Panels

The repository ships with human and mouse TUSCO panels as TSV files:

  • Human: src/utilities/report_qc/tusco_human.tsv
  • Mouse: src/utilities/report_qc/tusco_mouse.tsv

Additional details:

  • Current gene counts: human = 50, mouse = 32.
  • Assembly mapping for plots: human → hg38; mouse → mm10.

Note:

  • Only the TSV file is required for TUSCO operation. Files are named tusco_<species>.tsv.

Tissue‑Specific Panels

Curated tissue‑specific TUSCO references for human and mouse are available at https://tusco.uv.es.

To use a tissue‑specific panel:

  1. Obtain the TSV for your tissue and species.
  2. Replace the bundled TSV in src/utilities/report_qc/ for that species, keeping filenames unchanged:
    • Human: tusco_human.tsv
    • Mouse: tusco_mouse.tsv
  3. Run SQANTI3 QC with --tusco human or --tusco mouse.

Report Contents and Interpretation

TUSCO summarizes detection performance across its single‑isoform gene set and provides per‑gene visualizations. The report presents six primary metrics to maintain a “higher‑is‑better” orientation. Let TP/FP/FN be counts within the TUSCO set; nr denotes non‑redundant (gene‑level) counting; r denotes redundant (transcript‑level) counting.

Metrics (canonical definitions):

Metric Symbol Formula Orientation
Sensitivity Sn TP / (TP + FN) Higher is better
Non‑redundant Precision nrPre TP_nr / (TP_nr + FP_nr) Higher is better
Redundant Precision rPre TP_r / (TP_r + FP_r) Higher is better
Positive Detection Rate PDR (TP + FP)_genes / (TP + FN)_genes Higher is better
1 − False Discovery Rate 1−FDR 1 − FP / (TP + FP) Higher is better
Inverse Redundancy 1/red 1 / Redundancy Higher is better

Notes:

  • The HTML dashboard labels follow the symbols above (Sn, nrPre, rPre, 1−FDR, PDR, 1/red). Redundancy itself can be inspected in detailed tables/plots; 1/red is shown to align directionality.
  • Exact implementation follows the SQANTI3 TUSCO report; the symbols/formulas above are provided for interpretation.

Classification categories used in the report:

  • TP (True Positive): reference match or qualifying mono‑exon Full Splice Match (FSM).
  • PTP (Partial True Positive): FSM/Incomplete Splice Match (ISM) not counted as reference match (RM).
  • FP (False Positive): non‑canonical categories within the TUSCO set (e.g., NIC, NNC, genic, antisense, fusion, intergenic).
  • FN (False Negative): TUSCO genes with no detected transcript.

Per‑gene plots:

  • Sample transcripts are colored by SQANTI3 structural category (e.g., FSM, ISM, NIC, NNC).

For definitions of SQANTI3 categories, see: https://github.com/ConesaLab/SQANTI3/wiki/SQANTI3-isoform-classification:-categories-and-subcategories


Extending Beyond Human/Mouse

Support for additional species requires code changes to recognize a new --tusco value and to define the corresponding genome assembly for Gviz plotting.

  • Practical workaround: rename your custom TSV to tusco_human.tsv, then run with --tusco human (plots will use hg38 conventions).

Scope and Limitations

  • TUSCO evaluates detection and correctness within a curated set of single‑isoform genes; it does not substitute for comprehensive transcriptome‑wide evaluation.
  • Metrics are panel‑relative and reflect the behavior on well‑behaved genes; interpret alongside broader QC and task‑specific assessments.

Troubleshooting and Reproducibility

  • Missing or empty report: check (<outdir>)/logs/tusco_report.log for Rscript errors.
  • No or sparse plots: provide --isoforms as a GTF if possible; confirm --refGTF/--refFasta match the expected assemblies (human → hg38, mouse → mm10).
  • Panel mismatch: ensure the .tsv was correctly replaced when switching to a tissue‑specific set.
  • Environment: the report uses R with Gviz‑based plotting; ensure required R packages are available if you are running outside the standard setup.
  • Reproducibility checklist: record SQANTI3 commit/version, panel file checksums (tusco_<species>.tsv), command lines, and the tusco_report.log. Use fixed RNG seeds where applicable.

See Also


Citation

If TUSCO is used in analyses or figures, please cite SQANTI3 and the TUSCO resource as appropriate. Include software versions/commits, panel provenance (species, date, and filename), and commands sufficient to reproduce the report.

⚠️ **GitHub.com Fallback** ⚠️