7. Misc - filip-husnik/pseudofinder GitHub Wiki
We appreciate any critical comments or suggestions for improvements. Please raise issues or submit pull requests.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details.
This code was inspired mostly by work on bacterial symbionts in early stages of becoming intracellular and strictly host-associated. This ecological shift releases selection pressure ('use it or loose it') on many genes considered essential for free-living bacteria, so relatively recent symbionts can have over 50% of their genes pseudogenized.
Basic information about bacterial pseudogenes:
Recognizing the pseudogenes in bacterial genomes: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1142405/
Taking the pseudo out of pseudogenes: https://www.ncbi.nlm.nih.gov/pubmed/25461580
Several examples from the Sodalis clade showing how important is pseudogene annotation for bacteria in a nascent stage of symbiosis:
Mobile genetic element proliferation and gene inactivation impact over the genome structure and metabolic capabilities of Sodalis glossinidius, the secondary endosymbiont of tsetse flies: https://www.ncbi.nlm.nih.gov/pubmed/20649993
A novel human-infection-derived bacterium provides insights into the evolutionary origins of mutualistic insect–bacterial symbioses: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3499248/
Genome degeneration and adaptation in a nascent stage of symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3914690/
Repeated replacement of an intrabacterial symbiont in the tripartite nested mealybug symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5027413/
Large scale and significant expression from pseudogenes in Sodalis glossinidius - a facultative bacterial endosymbiont: https://www.biorxiv.org/content/early/2017/07/23/124388
There are several additional features we'll try to include in the script in the near future.
-
Include an optional FPKM cut-off when there are RNA-Seq data available.
-
Improve logic for ORFs on contig ends broken by assembly issues (e.g. metagenome-assembled genomes).
-
Check if the ORFs called as pseudogenes do not represent individual protein domains that can exist and evolve independently of the rest of the original multi-domain protein chain (PFAM?)
-
Fine tune pseudogene finding for mobile elements such as transposases.
-
Visualize results by a scatter plot of all genes/pseudogenes (dN/dS, GC content, expression, length ratio, ...).
-
Sometimes ORFs are predicted by mistake on the opposite strand or many additional spurious ORFs are predicted in GC-rich genomes (stop codons are AT-rich). Include an ORF filtering step and/or check regions with ORFS with no blastP hits by blastX. Include a proteomics validation step for hypothetical proteins.
Please suggest any additional features here: [https://github.com/filip-husnik/pseudofinder/issues].
Pseudofinder is developed by Mitch Syberg-Olsen1, Arkadiy Garber2, Patrick Keeling1, John McCutcheon2, and Filip Husnik3.
1 University of British Columbia, Vancouver, Canada
2 Arizona State University, Tempe, Arizona, USA
3 Okinawa Institute of Science and Technology, Okinawa, Japan
If it was useful for your work, please cite it as:
Syberg-Olsen MJ*, Graber AI*, Keeling PJ, McCutcheon JP, Husnik F. Pseudofinder: detection of pseudogenes in prokaryotic genomes, bioRxiv 2021, doi: https://doi.org/10.1101/2021.10.07.463580. GitHub repository: https://github.com/filip-husnik/pseudofinder/.
*Co-first authors.
Please also cite various dependencies used by Pseudofinder.