hcwang and qxdu edited on Aug 4, 2023, 1 version

Wasserstein GAN Model Architecture

Promoter design remains one of the most important considerations in metabolic engineering and synthetic biology applications. Theoretically, there are $4^{50}$ possible sequences for a 50-nt promoter, of which naturally occurring promoters make up only a small subset. To explore the vast number of potential sequences, wang et.al. used wgan model [1] for de novo promoter design in Escherichia coli. The model, which was guided by sequence features learned from natural promoters, could capture interactions between nucleotides at different positions and design novel synthetic promoters in silico. A schematic diagram of the whole workflpw has been provided.

Here, in order to facilitate users' understanding of the process of the model in biological sequences, we provide a more detailed operational pipeline. It should be noted that two models are used for training and descriminating. We also need to note that the results of WGAN are not stable enough.

Caution: we highly recommend that do not train wgan model for more than 12 epoch!

Input Parameters

We suggest that you define all parameters during the initialization phase. There are two types of parameters, one can only be defined during the initialization phase (Fixed), and the other can be redefined during the initialization or training/sampling phase (Flexible). However, in any case, a parameter can only be defined once.

Fixed params

params	description	default value
batch_size	training batch size	32
netG_lr	learning rate of Generator network	$10^{-4}$
netD_lr	learning rate of Discriminator network	$10^{-4}$
num_epochs	training epochs	12
print_epoch	sampling the output of the model every print_epoch epochs	1
save_epoch	aving the result of model every save_epoch epochs	1
Lambda	parameter that controls the weight of gradients_penalty	10
length	sequential length of the training dataset	50
model_name	parameter that controls the saving path under "./checkpoints"	wgan
seed	random seed, only defined in `generate()`	0

Flexible params

params	description	default value	flexible stage
dataset	path of the training dataset	None	`train()`
savepath	path for saving results	None	`train()`
sample_model_path	path of the trained model	None	`generate()`
sample_number	sampling number scale	None	`generate()`
sample_output	path for saving samples	None	`generate()`

Demo

A demo for model training/sampling is described below:

from gpro.generator.wgan.wgan import WGAN_language
# model training
default_root = "your working directory"
dataset_path = os.path.join(str(default_root),'data/sequence_data.txt')
checkpoint_path = os.path.join(str(default_root), 'checkpoints/wgan/')
model = WGAN_language(length=50)
model.train(dataset=dataset_path, savepath=checkpoint_path)
# model sampling
sample_model_path = os.path.join(str(default_root), 'checkpoints/wgan/checkpoints/net_G_12.pth')
sample_number = 1000
model.generate(sample_model_path, sample_number)

After the training step, you will have a checkpoints and a training_log file under "checkpoint_path/model_name"; when you further perform sampling, you can also get a samples file that contains your samples.

/checkpoints/wgan/model_name
    ├── checkpoints
    │   ├── net_D_i.pth
    │   └── net_G_i.pth
    ├── samples
    └── training_log

The detailed information of the file is as follows:

checkpoints: contains net_G_xxx/net_D_xxx, means the params of generator/discriminator
training_log: gen_iter_xx.txt, a fasta file that contains the sample of model output at xx epoch.
samples: a fasta file that contains the final result of model sampling, which might be further used for biological experiments or sequence optimization.

Citations

[1] Ye Wang and others, Synthetic promoter design in Escherichia coli based on a deep generative network, Nucleic Acids Research, Volume 48, Issue 12, 09 July 2020, Pages 6403–6412, https://doi.org/10.1093/nar/gkaa325

4.1.1 WGAN - WangLabTHU/GPro GitHub Wiki

Wasserstein GAN Model Architecture

Input Parameters

Fixed params

Flexible params

Demo

Citations

⚠️ GitHub.com Fallback ⚠️

4.1.1 WGAN - WangLabTHU/GPro GitHub Wiki

Wasserstein GAN Model Architecture

Input Parameters

Fixed params

Flexible params

Demo

Citations

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️