Naming Conventions - HenrikBengtsson/aroma.seq GitHub Wiki

Naming Conventions

Directory structures

All annotation data should be located in the annotationData/ directory structure. File and directory names are case sensitive and should be exactly as given. The general structure is:

annotationData/organisms/<organism>/<assembly>/<source>/<release>/

where the <source>/<release>/ are optional. This structure allows the Aroma Framework to automatically locate certain annotation data files.

Details:

  • annotationData/ - root directory of all annotation data files that the Aroma Framework uses.
  • annotationData/organisms/ - directory containing subdirectories for each organism for which annotation data is available.
  • annotationData/organisms/<organism>/ - directory specific to an organism <organism>, e.g. annotationData/organisms/Homo_sapiens/ and annotationData/organisms/Mus_musculus/. The format of <organism> should use the Ensemble organism name where (i) only the first term is capitalized and all other are in lower case, and (ii) where all space is replaced by an underscore (_).
  • annotationData/organisms/<organism>/<assembly> - directory specific to a genome assembly and organism, e.g. annotationData/organisms/Homo_sapiens/GRCh38,hg38/. The format of <assembly> should use the Genome Reference Consortium (CRC) label (e.g. GRCh38) with optional tags (e.g. UCSC hg38).
⚠️ **GitHub.com Fallback** ⚠️