7. Misc - filip-husnik/pseudofinder GitHub Wiki

Contributing

We appreciate any critical comments or suggestions for improvements. Please raise issues or submit pull requests.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details.

Acknowledgements

This code was inspired mostly by work on bacterial symbionts in early stages of becoming intracellular and strictly host-associated. This ecological shift releases selection pressure ('use it or loose it') on many genes considered essential for free-living bacteria, so relatively recent symbionts can have over 50% of their genes pseudogenized.

References

Basic information about bacterial pseudogenes:

Recognizing the pseudogenes in bacterial genomes: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1142405/

Taking the pseudo out of pseudogenes: https://www.ncbi.nlm.nih.gov/pubmed/25461580

Several examples from the Sodalis clade showing how important is pseudogene annotation for bacteria in a nascent stage of symbiosis:

Mobile genetic element proliferation and gene inactivation impact over the genome structure and metabolic capabilities of Sodalis glossinidius, the secondary endosymbiont of tsetse flies: https://www.ncbi.nlm.nih.gov/pubmed/20649993

A novel human-infection-derived bacterium provides insights into the evolutionary origins of mutualistic insect–bacterial symbioses: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3499248/

Genome degeneration and adaptation in a nascent stage of symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3914690/

Repeated replacement of an intrabacterial symbiont in the tripartite nested mealybug symbiosis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5027413/

Large scale and significant expression from pseudogenes in Sodalis glossinidius - a facultative bacterial endosymbiont: https://www.biorxiv.org/content/early/2017/07/23/124388

Wish list

There are several additional features we'll try to include in the script in the near future.

  1. Include an optional FPKM cut-off when there are RNA-Seq data available.

  2. Improve logic for ORFs on contig ends broken by assembly issues (e.g. metagenome-assembled genomes).

  3. Check if the ORFs called as pseudogenes do not represent individual protein domains that can exist and evolve independently of the rest of the original multi-domain protein chain (PFAM?)

  4. Fine tune pseudogene finding for mobile elements such as transposases.

  5. Visualize results by a scatter plot of all genes/pseudogenes (dN/dS, GC content, expression, length ratio, ...).

  6. Sometimes ORFs are predicted by mistake on the opposite strand or many additional spurious ORFs are predicted in GC-rich genomes (stop codons are AT-rich). Include an ORF filtering step and/or check regions with ORFS with no blastP hits by blastX. Include a proteomics validation step for hypothetical proteins.

Please suggest any additional features here: [https://github.com/filip-husnik/pseudofinder/issues].

Citing Pseudofinder

Pseudofinder is developed by Mitch Syberg-Olsen1, Arkadiy Garber2, Patrick Keeling1, John McCutcheon2, and Filip Husnik3.

1 University of British Columbia, Vancouver, Canada

2 Arizona State University, Tempe, Arizona, USA

3 Okinawa Institute of Science and Technology, Okinawa, Japan

If it was useful for your work, please cite it as:

Syberg-Olsen MJ*, Graber AI*, Keeling PJ, McCutcheon JP, Husnik F. Pseudofinder: detection of pseudogenes in prokaryotic genomes, bioRxiv 2021, doi: https://doi.org/10.1101/2021.10.07.463580. GitHub repository: https://github.com/filip-husnik/pseudofinder/.

*Co-first authors.

Please also cite various dependencies used by Pseudofinder.

⚠️ **GitHub.com Fallback** ⚠️