Pipelines: Best Practices - QuantGen/HPCC GitHub Wiki
- Componentialize: One component for each task (this is especially important
when there are multiple people working on the project)
- How to organize?
- How to implement?
- A pipeline will likely contain different types of scripts (e.g., written
in bash or R, or using external software such as PLINK)
- I like to make sure that they have a similar interface, i.e., command
name + arguments (in R, this can be done with the optparse package)
- This approach will also work well with Slurm
- Each component should be minimal
- Minimize output messages
- Put functions that do the heavy lifting into separate files, or even
better (especially with C/C++ functions), an R package
- The usual software development best practices apply (Code Complete)
- Use a shared coding standard
- Don't be too clever
- Document your code
- Use good variable names (avoid 'x', 'tmp', ...)
- Differentiate between data (what you start with), code, and output
- Code and output should correspond in directories so that you know what
comes from where
- Use version control (i.e., git)