Naming and style - core-unit-bioinformatics/knowledge-base GitHub Wiki
author | date | tags |
---|---|---|
PE | 2025-02-13 | cubi, policy, naming, convention, rule, standard |
PE | 2024-05-09 | cubi, internal, convention, rule, policy, standard |
PE | 2022-09-16 | cubi, internal, convention, rule, policy, standard |
The following guidelines can be broken if really necessary, and discussed if perceived as unnecessary or misguided.
For Python and closely related code (such as Snakemake modules),
the CUBI follows the respective PEP8 style guide.
Realizing this requirement is usually possible via code-formatting tools such
as black
and snakefmt
, but it is advisable to recognize badly formatted code
when you see it. The following excerpt from the PEP8 style guide also provides
a reasonable view on the subject:
[...] code is read much more often than it is written. [...]
A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.
However, know when to be inconsistent – sometimes style guide recommendations just aren’t applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!
In particular: do not break backwards compatibility just to comply with this PEP!
The CUBI maintains three types of code repositories (TODO: see dev guidelines for details):
- workflows
- projects
- other
Naming workflow repositories: workflow-[system]-[short-desc-name]
-
[system]
: the respective (eco-)system the workflow is designed for, e.g., snakemake, nextflow, common workflow language (CWL), workflow definition language etc.- snakemake:
smk
- nextflow:
nxf
- common workflow language:
cwl
- workflow definition language:
wdl
-
reminder: the
system
part must be put in thepyproject.toml
file under the section[cubi.workflow.template]
with the keysystem = "system"
.
- snakemake:
-
[short-desc-name]
: a short descriptive name using only the charactersa-z
and-
(minus). Numbers0-9
may be used if necessary and reasonable.-
reminder: the
short-desc-name
part of the name must be put in thepyproject.toml
file under section[cubi.workflow]
with the keyname = "short-desc-name"
.
-
reminder: the
- example:
workflow-smk-longread-variant-calling
Naming project repositories: project-[type*]-[short-desc-name]
-
[type*]
: the type of the project (production, development etc.)-
DEPRECATED: as of 2025, the
type
part in the repository name must be omitted- project repositories created before 2025 DO NOT have to be updated / renamed
- development:
dev
; the project was started to build a new workflow for the CUBI catalogue. The respective project repository may thus document the development while it is still in progress, or organize (preprocess) test data. - production (run):
run
; the project was started to process data with an existing workflow, and thus contains sample information (e.g., phenotypical annotation), or some specific routines for (meta-)data preprocessing. - benchmark:
bmk
; the project was started to evaluate performance aspects of an existing pipeline, e.g., as part of round robin tests in a consortium. -
reminder: the
type
part must be put in thepyproject.toml
file under the section[cubi.project]
with the keytype = "type"
.- in projects started in 2025 and later, the
type
attribute must be omitted from thepyproject.toml
file
- in projects started in 2025 and later, the
-
DEPRECATED: as of 2025, the
-
[short-desc-name]
: a short descriptive name using only the charactersa-z0-9
and-
(minus).- Obviously, the short name of the project should not duplicate the short name of the executed pipeline(s), but be a reference to the project context.
-
reminder: the
short-desc-name
part of the name must be put in thepyproject.toml
file under section[cubi.project]
with the keyname = "short-desc-name"
.
- example:
- up to 2024:
project-TYPE-xyz-cohort
- 2025 and later:
project-xyz-cohort
- up to 2024:
Naming other repositores: use your best judgement or ask your colleagues for feedback
The following rules primarily --- but not exclusively --- apply to repositories that exist to realize a concerted and orderly code design and implementation effort, i.e. workflow and (software) tool repositories. For CUBI project repositories in particular, please refer to the respective guideline document on how to structure a CUBI project repository.
Following current tendencies, the following naming conventions should be used in repositories:
- base (release) branch:
main
- central merge (development) branch:
dev
- feature development branches
- commonly only exist in workflow or tool repositories
- restricted to characters
a-z0-9
and-
(minus) - naming policy:
feat-[short-desc-name]
- issue fixing branches
- restricted to characters
a-z0-9
and-
(minus) - naming policy:
issueNN-[short-desc-name]
whereNN
is the issue number (usually on github)
- restricted to characters
- analysis branches:
a-z0-9
and-
(minus)- commonly only exist in project repositories
- restricted to characters
a-z0-9
and-
(minus) - recommended:
analysis-[short-desc-name]
File names should be formed using only these characters:
-
A-Z
: uppercase should be limited to names or IDs (such as sample names) -
a-z
: as-is -
0-9
: as-is -
-
: minus, preferably used as "within-context" separator- example: context is specifying a date, i.e.
2022-09-16
- example: context is specifying a date, i.e.
-
_
: underscore, preferably used as "between-context" separator- example:
SAMPLE1_dataA
andSAMPLE1_dataB
, i.e. context one is the sample ID, and context two is the data type.
- example:
-
.
: dot, preferably used to indicate file format changes- example:
.vcf
to.vcf.gz
tovcf.gz.tbi
- example:
It is probably a universal truth that there is no single file naming scheme "to rule them all". Hence, think before (re-)naming files, but accept that you cannot find a perfect solution (note that it says "preferably" in the above guidelines).
Final remark: using whitespace or special characters in ordinary file names means that you have reached the antipode of perfection.