Corpus versioning - Radega1993/the-one-scenario-corpus GitHub Wiki
Purpose: explain how the corpus evolved and what is the official frozen set.
-
corpus_v1is the official, frozen set used for the paper baseline. - It contains 60 scenarios across 7 families.
- Initial corpus generation focused on broad thematic coverage.
- Iterative diversification rounds reduced high pairwise redundancy.
- Final freeze keeps a practical trade-off: publishable baseline with declared limitations.
If future revisions modify scenario content or composition, they should be published under a new corpus label (for example corpus_v2) while preserving corpus_v1 for reproducibility.