Home - ampinzonv/BB3 GitHub Wiki

This project aimed to develop a Bioinformatics library for common tasks in the field of Computational Biology using the BASH programming language. It is led by Andrés M. Pinzón , full time professor at Bioinformatics and Systems Biology Laboratory , Institute for Genetics - National University of Colombia, and was developed as part of his 2022-2023 sabbatical leave.

Why BioBash?

Basically this library has been around for several years in our laboratory, as a bunch of routines programmed for common bioinformatics tasks such as dealing with FASTA headers and FQ files, as well as with manipulation of lists of genes etc.

After years of using this first version of BioBash it was clear that it was really, really useful for our common tasks but at the same time it was lacking on features and scope. Moreover, it was based on pure BASH programming language and I found myself re-inventing the wheel ...in BASH! So, since there are hundreds of useful and optimized bioinformatics tools, why create something like BioBash? I can think of at least to good answers for that:

Not everything has been already done in computational biology, and there is still room for some improvement.
We can take advantage of a whole universe of computational biology tools and make them even more accesible in a common bioinformatics environment: BASH.

A matter of consistency and efficiency

So one of the main aims of BioBash is to have a consistent interface for common analysis in the field, without re-inventing the wheel (it is re-using as much code as possible and interfacing already existent tools) and in a common ambient for Computational Biologists (e.i. BASH). In this regard, on one hand BioBash is a wrapper for several pre-existent bioinformatics tools, such as clustalw, seqtk, BWA, Bowtie, NCBI-BLAST etc, with a consistent interface for all of them. On the other hand it also provides brand new routines for file manipulation and other Bioinformatics-related tasks common in the field (such as dealing with lists), and for that regard uses core utils that should come with any UNIX-Like installation.

For example, if you have a list of genes in a text file and you want to know how many of these are unique genes, and how many are over-represented in the list, one way is to use common core BASH commands such as sort and unique, to obtain that information, OR use BioBASH and forget about all the command line options needed for each program.

Another example, if you have two multiple FASTA files, and you want to BLAST one to each other and see how they match (and perhaps plot the results), you can use NCBI-BLAST's formatdb command, create the database, and then use blastp or blastn (or any other variant), perform the alignment (with all the options necessary) and obtain your results. OR you can use BioBash and go for a cup of coffee, and let BioBash deal with routes, temporary files, re-naming, threading, plotting etc.

So I believe BioBash can make you more efficient through a consistent interface for several computational biology tools. With "consistency" we mean all commands in BioBash behave, respond and output in a similar way, no matter what is happening behind scenes. We also mean that BioBash:

Implements a consistent command line interface (CLI) for all its tools.
Has a super simple installation procedure.
Provides "ready to use" BioBASH programs targeted for useful and common Bioinformatics tasks.
Aims to provide a detailed documentation.
Supports most common UNIX-Like operating Systems such as OSX and Linux.

BioBASH is not really a new kid on the block

Although BioBASH is perhaps the most complete Bash library for Bioinformatics, this is not really a new idea in the community, several other projects under the same name has been started (and abandoned) with the same name, as we also did with our first version of this library. Thus, to our knowledge amongst all those biobash projects, the only worth mentioning is Simon Frost's Biobash, really nice scripts although it was poorly documented, not structured as a useful library but more like a group of independent scripts, and was also abandoned around 2018.

Coding standards

A key component on any coding project is to follow a good coding standard and do your best to implement good programming practices into your code. This makes code more accesible to anyone willing to contribute, makes debug easier (bugs are less common) and also helps to speed up the generation of documentation.

The standards followed by BB are the Shell Style Guide suggested by the Google Style Guides community, as well as the ones suggested by Jeff Lindsay at Progrium. I think these are must follow rules for everyone programming in BASH.

Side note

In case this can be interesting to anyone, this project has been developed both under OSX (using Parallels and Lubuntu 24) and Linux machines (depending if I am at home or at work) using VSCode as code editor, supported by the following plug-ins:

Bash IDE
Bats
indent-rainbow
shell-format
ShellCheck

Cool stuff helped BioBash development

Several sources were used while developing BioBash (apart from the Third Party libraries above). Some of these are: