4.1.3 cGAN and VAE - WangLabTHU/GPro GitHub Wiki

hcwang and qxdu edited on Aug 4, 2023, 1 version

cGAN Model

The conditional generation model aims to learn the potential joint distribution of data and labels to achieve conditional data generation. Here, we reproduce the latest article published by Haochen Wang et al. in Nature Communication and implement a demo of the design of constitutive and inducible promoters. A schematic diagram of the diffusion process has been provided.

The architecture of this model is different from our previous two models, please refer to the format of ecoli_mpra_3_laco.csv. This only provides a reproduction of the paper work. If you wish to test data on a new test set, please refer to the paper in detail [1].

When initializing, the input parameters of the model are as follows:

params description default value
data_name the saving tag of data ecoli_mpra_3_laco
model_name the saving tag of model deepseed_ecoli_mpra_3_laco
seqL length of input sequence 165
dataset path of the training dataset None
savepath path for saving results None
n_iters total training epoch 10000
save_iters saving the result of model every save_epoch epochs 1000

When generating, the model parameters are as follows:

params description default value
input_file file like input_promoters.txt, provide the format of the mask None
sample_model_path path of the trained model None
sample_output whether output the sample file True
seed sampled random number seed 0

A demo for model training/sampling is described below. You can perform following program under demo/demo5 folder:

from gpro.generator.others.cgan.cgan import Deepseed

# training
model = Deepseed(n_iters=10000, save_iters=10000, dataset="./datasets/ecoli_mpra_3_laco.csv", savepath="./checkpoints")
model.train()

# sampling
model.generate(input_file = './datasets/input_promoters.txt', sample_model_path='./checkpoints/check/deepseed_ecoli_mpra_3_laco/net_G_9999.pth')

After the training step, you will have a cache and a check folder; when you further perform sampling, you can also get a samples file that contains your samples.

/checkpoints/cache/model_name
    ├── figure
    │   └── 4-mer frequency
    ├── gen_iter
    │   └── sampling at every save_iter
    ├── inducible
    │   └── csv format of samles
    └── training_log
        └── training log file

The remaining model files and other settings are the same as before.

VAE Model

A VAE is an autoencoder whose encodings distribution is regularized during the training in order to ensure that its latent space has good properties allowing us to generate some new data. A schematic diagram of the diffusion process has been provided.

When initializing, the input parameters of the model are as follows[3]:

It should be noted that our VAE structure here is completely based on [2], and we have not yet conducted a comprehensive evaluation of the generated sequence quality, optimal training period, etc. Therefore, we cannot provide an accurate parameter table here. However, the format of the parameters should be consistent with the WGAN model.

You can import SimpleVAE class from gpro.generator.others.vae.vae.

A simple demo for VAE training can be described below:

from gpro.generator.others.vae.vae import SimpleVAE

dataset_path = './datasets/sequence_data.txt'
checkpoint_path = './checkpoints'
model = SimpleVAE(length=50)
model.train(dataset=dataset_path, savepath=checkpoint_path)

model.generate(sample_model_path, sample_number, seed) # same with wgan and diffusion

Citations

[1] Zhang, P., Wang, H., Xu, H. et al. Deep flanking sequence engineering for efficient promoter design using DeepSEED. Nat Commun 14, 6309 (2023). https://doi.org/10.1038/s41467-023-41899-y

[2] Brookes D, Park H, Listgarten J. Conditioning by adaptive sampling for robust design[C]//International conference on machine learning. PMLR, 2019: 773-782.

[3] Linder J, Bogard N, Rosenberg A B, et al. A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences[J]. Cell systems, 2020, 11(1): 49-62. e16.
⚠️ **GitHub.com Fallback** ⚠️