1. Introduction - WangLabTHU/GPro GitHub Wiki

hcwang and qxdu edited on Aug 4, 2023, 1 version

synopsis

The gpro package focus on de-novo promoter design and sequence expression prediction. Except basic generator and predictor models, this package also takes several considerations for optimization and evaluation.

gpro mainly consists of four parts: generator, predictor, optimizer and evaluator.

  • The generator performs de-novo generation of new promoter sequences based on generative models.
  • The predictor performs prediction and analyzation of the expression level of the obtained new sequence.
  • The optimizer combines current mainstream strategies, using multiple algorithms to optimize the sequence that best meets the given metrics and performance constraints.
  • The evaluator provides a series of functions to evaluate the performance of our priorly trained models.

Compared with existing works([1],[2],[3]), this package have the following advantages:

  • Large quantity and high diversity: producing a large number of synthetic promoters at once.
  • Multi species: Supports promoter design for Escherichia coli, yeast, and mammals.
  • Multi models: GAN/cGAN/VAE/diffusion; CNN/CNN-LSTM/Transfromer/mixed model.
  • Comprehensive promoter evaluation indicators: comprehensive evaluation of promoter components.

The main workflows have been briefly represented by Zrimec et al ([4]). Comparing with the direct evolution process of protein ([5]) and the existing adaptive machine learning pipelines ([6]), we provide a complete optimization scheme based on Gpro in the promoter sequence space. You can continuously obtain new candidate sequences through Gpro, conduct biological experiments, and provide feedback to the pipeline's database:

Quick start: The way to use GPro easily is provided in the Quick Start page.

Model comparsion: It might be hard for new users to decide on which model to use in each part. Some comparison of models are provided in the wiki page.

citations

[1] LaFleur T L, Hossain A, Salis H M. Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria[J]. Nature communications, 2022, 13(1): 5159.
[2] Van Brempt M, Clauwaert J, Mey F, et al. Predictive design of sigma factor-specific promoters[J]. Nature communications, 2020, 11(1): 5822.
[3] Pfotenhauer A C, Occhialini A, Nguyen M A, et al. Building the plant SynBio toolbox through combinatorial analysis of DNA regulatory elements[J]. ACS Synthetic Biology, 2022, 11(8): 2741-2755.
[4] Zrimec J, Fu X, Muhammad A S, et al. Controlling gene expression with deep generative design of regulatory DNA[J]. Nature communications, 2022, 13(1): 5099.
[5] Yu T, Boob A G, Singh N, et al. In vitro continuous protein evolution empowered by machine learning and automation[J]. Cell Systems, 2023.
[6] Hie B L, Yang K K. Adaptive machine learning for protein engineering[J]. Current opinion in structural biology, 2022, 72: 145-152.
⚠️ **GitHub.com Fallback** ⚠️