Home - glarue/intronIC GitHub Wiki

intronIC_logo

intronIC (intron Interrogator and Classifier)

Version 2.7 — multi-species default classifier with mode-separation two-pass recalibration and continuous per-intron discount

intronIC is a bioinformatics tool for extracting and classifying intron sequences as U12-type (minor) or U2-type (major) using a support vector machine trained on position-weight matrix scores.


Key Features

  • Two-pass RBF SVM classifier on a 6D feature set: first-pass cluster-aware ensemble (v4_aug) produces candidate weights; second-pass mode-separation ensemble (v5_modesep_aug) re-scores eligible introns after per-species recalibration. Both ensembles are 126 models (3 seeds × 42 sub-models). Probability scores (0-100%).
  • Per-species mode-separation z-scoring pins the U2 mode to z=0 and the U12 mode to z=1 in every species, so a single trained boundary works across eukaryotic diversity (plant recall: AmbTri 90% → 100%, OrySat 94% → 100%)
  • Three-check gate (n_eff floor + μ_U12 location prior + Fisher-discriminant KDE valley depth) protects against the failure modes of per-species recalibration; U12-absent species fall back to first-pass scores cleanly
  • Continuous per-intron discount (v2.7+) writes an adjusted_score column that penalizes SVM overcalls relative to motif log-LR; preserves IPA-validated TPs while suppressing long-tail false positives
  • Species-specific background correction as an inference-time robustness layer for out-of-distribution species composition (still on by default; see Technical Details)
  • Pretrained models for immediate cross-species analysis
  • Streaming mode keeps peak memory roughly half of in-memory mode (~5.3 GB on full human at -p 5); both modes produce bit-identical results since v2.4
  • Parallel processing for improved performance
  • YAML configuration for reproducible analysis pipelines
  • Comprehensive metadata including phase, position, parent gene/transcript

Quick Links

Quick Start — Installation and first run Overview — Algorithm and classification basics Output Files — Understanding results Full Usage Info — Complete CLI reference


Documentation


Citation

If you use intronIC in your research, please cite:

Moyer DC, Larue GE, Hershberger CE, Roy SW, Padgett RA. (2020) Comprehensive database and evolutionary dynamics of U12-type introns. Nucleic Acids Research 48(13):7066–7078. doi:10.1093/nar/gkaa464

⚠️ **GitHub.com Fallback** ⚠️