hcwang and qxdu edited on Aug 4, 2023, 1 version

cGAN Model

The conditional generation model aims to learn the potential joint distribution of data and labels to achieve conditional data generation. Here, we reproduce the latest article published by Haochen Wang et al. in Nature Communication and implement a demo of the design of constitutive and inducible promoters. A schematic diagram of the diffusion process has been provided.

The architecture of this model is different from our previous two models, please refer to the format of ecoli_mpra_3_laco.csv. This only provides a reproduction of the paper work. If you wish to test data on a new test set, please refer to the paper in detail [1].

When initializing, the input parameters of the model are as follows:

params	description	default value
data_name	the saving tag of data	ecoli_mpra_3_laco
model_name	the saving tag of model	deepseed_ecoli_mpra_3_laco
seqL	length of input sequence	165
dataset	path of the training dataset	None
savepath	path for saving results	None
n_iters	total training epoch	10000
save_iters	saving the result of model every save_epoch epochs	1000

When generating, the model parameters are as follows:

params	description	default value
input_file	file like `input_promoters.txt`, provide the format of the mask	None
sample_model_path	path of the trained model	None
sample_output	whether output the sample file	True
seed	sampled random number seed	0

A demo for model training/sampling is described below. You can perform following program under demo/demo5 folder:

from gpro.generator.others.cgan.cgan import Deepseed

# training
model = Deepseed(n_iters=10000, save_iters=10000, dataset="./datasets/ecoli_mpra_3_laco.csv", savepath="./checkpoints")
model.train()

# sampling
model.generate(input_file = './datasets/input_promoters.txt', sample_model_path='./checkpoints/check/deepseed_ecoli_mpra_3_laco/net_G_9999.pth')

After the training step, you will have a cache and a check folder; when you further perform sampling, you can also get a samples file that contains your samples.

/checkpoints/cache/model_name
    ├── figure
    │   └── 4-mer frequency
    ├── gen_iter
    │   └── sampling at every save_iter
    ├── inducible
    │   └── csv format of samles
    └── training_log
        └── training log file

The remaining model files and other settings are the same as before.

VAE Model

A VAE is an autoencoder whose encodings distribution is regularized during the training in order to ensure that its latent space has good properties allowing us to generate some new data. A schematic diagram of the diffusion process has been provided.

When initializing, the input parameters of the model are as follows[3]:

It should be noted that our VAE structure here is completely based on [2], and we have not yet conducted a comprehensive evaluation of the generated sequence quality, optimal training period, etc. Therefore, we cannot provide an accurate parameter table here. However, the format of the parameters should be consistent with the WGAN model.

You can import SimpleVAE class from gpro.generator.others.vae.vae.

A simple demo for VAE training can be described below:

from gpro.generator.others.vae.vae import SimpleVAE

dataset_path = './datasets/sequence_data.txt'
checkpoint_path = './checkpoints'
model = SimpleVAE(length=50)
model.train(dataset=dataset_path, savepath=checkpoint_path)

model.generate(sample_model_path, sample_number, seed) # same with wgan and diffusion

Citations

[1] Zhang, P., Wang, H., Xu, H. et al. Deep flanking sequence engineering for efficient promoter design using DeepSEED. Nat Commun 14, 6309 (2023). https://doi.org/10.1038/s41467-023-41899-y

[2] Brookes D, Park H, Listgarten J. Conditioning by adaptive sampling for robust design[C]//International conference on machine learning. PMLR, 2019: 773-782.

[3] Linder J, Bogard N, Rosenberg A B, et al. A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences[J]. Cell systems, 2020, 11(1): 49-62. e16.

4.1.3 cGAN and VAE - WangLabTHU/GPro GitHub Wiki

cGAN Model

VAE Model

Citations

⚠️ GitHub.com Fallback ⚠️

4.1.3 cGAN and VAE - WangLabTHU/GPro GitHub Wiki

cGAN Model

VAE Model

Citations

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️