Contributing - FranckLejzerowicz/metagenomix GitHub Wiki

🚧

Overview

metagenomix is a versatile and modular project that allows anyone to add new shotgun metagenomic analysis capabilities, mainly be adding new softwares that are not yet available in the softwares list for use in one's pipeline configuration.

Adding new softwares is pretty straightforward at this point since the per-software code structure can be reproduced and tested easily using the templates and associated tutorials provided on this page.

Basic architecture

For a software, it is handy to write at least two functions in one of the relevant code files.

Given the fact that the input is always a dictionary but that its content can be slightly different depending on whether the software is meant to be used upstream or downstream of the pivotal co-assembly pooling step, then the first loop function (see data structures) should know whether it is iteration over inputs per sample or per co-assembly group (most cases), or per other units for holistic softwares.

Code files

The metagenomix project includes a folder named "softwares", where different python (.py) code files were created to contain the code for different groups of softwares. These groups were defined based on the type of analyses that these softwares are supposed to do. You may not agree on this rather loosely defined categorisation (it does not match the categories of the current softwares list): it will be possible to redistribute the code later on.

Tutorial

🚧

Templates

Code

Main

The name of the main function call to a software must exactly match the name of the software as it is invoked in the pipeline configuration file. For example, the bowtie2 software has a main function called bowtie2() located in the alignment.py code file. As explained in the code files resource, all the software functions are successfully called because their name match the name parsed from the pipeline configuration file (as part of the commands collection core mechanism).

There can be exceptions to this for complex softwares that themselves are pipelines with sub-commands (such as metawrap, which in command line necessitates to be called as e.g. metawrap binning [options], or metawrap bin_refinement [options]). In this case, it is possible to take advantage of the ability that metagenomix has to handle either specific sub-commands (like different software call: will create separate jobs), or to run all the sub-commands available for this main software (as a routine: will create one job of all sub-commands). Please see this tutorial on modular softwares to add such complex tools.

Getters

get_<software>()

Command

<software>_cmd()

Parameters

Unittests

Notes

Do's

Don'ts

Final note

It is always nice to learn from each other at a patient, accepting and constructive pace. Hence, please do not hesitate to be creative and ask questions that I hope we will discuss with clarity, open-mindedness and friendly manners. Thanks for you interest!

⚠️ **GitHub.com Fallback** ⚠️