TUSCO quick start - ConesaLab/SQANTI3 GitHub Wiki
TUSCO (Transcriptome Universal Single‑isoform Control) provides a compact, curated set of single‑isoform genes to benchmark transcriptome reconstruction. When enabled in SQANTI3 QC, it produces a self‑contained HTML report summarizing detection performance and per‑gene visualizations (Gviz‑based, IGV‑like tracks). This guide specifies inputs, execution, outputs, and interpretation to support rigorous and reproducible use.
- Isoform annotations: provide a transcript GTF via
--isoformsfor optimal plotting.- If starting from FASTA/FASTQ, SQANTI3 generates a corrected GTF internally, which is then used by TUSCO.
- Reference resources:
--refGTF(gene annotation) and--refFasta(genome assembly) are required. - Reporting: ensure
--reportis not skipped; TUSCO runs alongside the standard SQANTI3 QC report.
python sqanti3_qc.py \
--isoforms your_transcripts.gtf \
--refGTF reference.gtf \
--refFasta genome.fasta \
--tusco human|mouse \
--report html \
-o <prefix> -d <outdir>Key flag:
-
--tusco: selects the bundled species panel (humanormouse).
-
(<outdir>)/(<prefix>)_TUSCO_report.html— self‑contained HTML benchmarking report. -
(<outdir>)/logs/tusco_report.log— Rscript log for the TUSCO report generation. -
(<outdir>)/igv_plots/— one PNG per TUSCO gene with Gviz tracks and SQANTI3 category coloring.
The TUSCO report is optional. If it fails, the main QC pipeline continues. Consult
tusco_report.logfor diagnostics.
The repository ships with human and mouse TUSCO panels as TSV files:
- Human:
src/utilities/report_qc/tusco_human.tsv - Mouse:
src/utilities/report_qc/tusco_mouse.tsv
Additional details:
- Current gene counts: human = 50, mouse = 32.
- Assembly mapping for plots: human →
hg38; mouse →mm10.
Note:
- Only the TSV file is required for TUSCO operation. Files are named
tusco_<species>.tsv.
Curated tissue‑specific TUSCO references for human and mouse are available at https://tusco.uv.es.
To use a tissue‑specific panel:
- Obtain the TSV for your tissue and species.
- Replace the bundled TSV in
src/utilities/report_qc/for that species, keeping filenames unchanged:- Human:
tusco_human.tsv - Mouse:
tusco_mouse.tsv
- Human:
- Run SQANTI3 QC with
--tusco humanor--tusco mouse.
TUSCO summarizes detection performance across its single‑isoform gene set and provides per‑gene visualizations. The report presents six primary metrics to maintain a “higher‑is‑better” orientation. Let TP/FP/FN be counts within the TUSCO set; nr denotes non‑redundant (gene‑level) counting; r denotes redundant (transcript‑level) counting.
Metrics (canonical definitions):
| Metric | Symbol | Formula | Orientation |
|---|---|---|---|
| Sensitivity | Sn | TP / (TP + FN) | Higher is better |
| Non‑redundant Precision | nrPre | TP_nr / (TP_nr + FP_nr) | Higher is better |
| Redundant Precision | rPre | TP_r / (TP_r + FP_r) | Higher is better |
| Positive Detection Rate | PDR | (TP + FP)_genes / (TP + FN)_genes | Higher is better |
| 1 − False Discovery Rate | 1−FDR | 1 − FP / (TP + FP) | Higher is better |
| Inverse Redundancy | 1/red | 1 / Redundancy | Higher is better |
Notes:
- The HTML dashboard labels follow the symbols above (Sn, nrPre, rPre, 1−FDR, PDR, 1/red). Redundancy itself can be inspected in detailed tables/plots; 1/red is shown to align directionality.
- Exact implementation follows the SQANTI3 TUSCO report; the symbols/formulas above are provided for interpretation.
Classification categories used in the report:
- TP (True Positive): reference match or qualifying mono‑exon Full Splice Match (FSM).
- PTP (Partial True Positive): FSM/Incomplete Splice Match (ISM) not counted as reference match (RM).
- FP (False Positive): non‑canonical categories within the TUSCO set (e.g., NIC, NNC, genic, antisense, fusion, intergenic).
- FN (False Negative): TUSCO genes with no detected transcript.
Per‑gene plots:
- Sample transcripts are colored by SQANTI3 structural category (e.g., FSM, ISM, NIC, NNC).
For definitions of SQANTI3 categories, see: https://github.com/ConesaLab/SQANTI3/wiki/SQANTI3-isoform-classification:-categories-and-subcategories
Support for additional species requires code changes to recognize a new --tusco value and to define the corresponding genome assembly for Gviz plotting.
- Practical workaround: rename your custom TSV to
tusco_human.tsv, then run with--tusco human(plots will usehg38conventions).
- TUSCO evaluates detection and correctness within a curated set of single‑isoform genes; it does not substitute for comprehensive transcriptome‑wide evaluation.
- Metrics are panel‑relative and reflect the behavior on well‑behaved genes; interpret alongside broader QC and task‑specific assessments.
- Missing or empty report: check
(<outdir>)/logs/tusco_report.logfor Rscript errors. - No or sparse plots: provide
--isoformsas a GTF if possible; confirm--refGTF/--refFastamatch the expected assemblies (human →hg38, mouse →mm10). - Panel mismatch: ensure the
.tsvwas correctly replaced when switching to a tissue‑specific set. - Environment: the report uses R with Gviz‑based plotting; ensure required R packages are available if you are running outside the standard setup.
- Reproducibility checklist: record SQANTI3 commit/version, panel file checksums (
tusco_<species>.tsv), command lines, and thetusco_report.log. Use fixed RNG seeds where applicable.
- TUSCO‑novel (Novel Isoform Stress Test) — methodology to stress‑test novel discovery; uses the same TSV panel and TUSCO metrics.
If TUSCO is used in analyses or figures, please cite SQANTI3 and the TUSCO resource as appropriate. Include software versions/commits, panel provenance (species, date, and filename), and commands sufficient to reproduce the report.