Home - glarue/intronIC GitHub Wiki

Version 2.7 — multi-species default classifier with mode-separation two-pass recalibration and continuous per-intron discount
intronIC is a bioinformatics tool for extracting and classifying intron sequences as U12-type (minor) or U2-type (major) using a support vector machine trained on position-weight matrix scores.
-
Two-pass RBF SVM classifier on a 6D feature set: first-pass cluster-aware ensemble (
v4_aug) produces candidate weights; second-pass mode-separation ensemble (v5_modesep_aug) re-scores eligible introns after per-species recalibration. Both ensembles are 126 models (3 seeds × 42 sub-models). Probability scores (0-100%). - Per-species mode-separation z-scoring pins the U2 mode to z=0 and the U12 mode to z=1 in every species, so a single trained boundary works across eukaryotic diversity (plant recall: AmbTri 90% → 100%, OrySat 94% → 100%)
- Three-check gate (n_eff floor + μ_U12 location prior + Fisher-discriminant KDE valley depth) protects against the failure modes of per-species recalibration; U12-absent species fall back to first-pass scores cleanly
-
Continuous per-intron discount (v2.7+) writes an
adjusted_scorecolumn that penalizes SVM overcalls relative to motif log-LR; preserves IPA-validated TPs while suppressing long-tail false positives - Species-specific background correction as an inference-time robustness layer for out-of-distribution species composition (still on by default; see Technical Details)
- Pretrained models for immediate cross-species analysis
-
Streaming mode keeps peak memory roughly half of in-memory mode (~5.3 GB on full human at
-p 5); both modes produce bit-identical results since v2.4 - Parallel processing for improved performance
- YAML configuration for reproducible analysis pipelines
- Comprehensive metadata including phase, position, parent gene/transcript
Quick Start — Installation and first run Overview — Algorithm and classification basics Output Files — Understanding results Full Usage Info — Complete CLI reference
- Overview — Classification approach and basic usage
- Quick start — Installation and testing
- Technical Details — Algorithm and data flow
- Data filtering notes — Filtering criteria and tags
- Output files — File formats and columns
- Training data and PWMs — Reference data and matrices
- Example usage — Common use cases
- Full usage info — Complete argument reference
- About — Background and motivation
If you use intronIC in your research, please cite:
Moyer DC, Larue GE, Hershberger CE, Roy SW, Padgett RA. (2020) Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research 48(13):7066–7078. doi:10.1093/nar/gkaa464