4.1.1 WGAN - WangLabTHU/GPro GitHub Wiki
hcwang and qxdu edited on Aug 4, 2023, 1 version
Promoter design remains one of the most important considerations in metabolic engineering and synthetic biology applications. Theoretically, there are
Here, in order to facilitate users' understanding of the process of the model in biological sequences, we provide a more detailed operational pipeline. It should be noted that two models are used for training and descriminating. We also need to note that the results of WGAN are not stable enough.
Caution: we highly recommend that do not train wgan model for more than 12 epoch!
We suggest that you define all parameters during the initialization phase. There are two types of parameters, one can only be defined during the initialization phase (Fixed
), and the other can be redefined during the initialization or training/sampling phase (Flexible
). However, in any case, a parameter can only be defined once.
params | description | default value |
---|---|---|
batch_size | training batch size | 32 |
netG_lr | learning rate of Generator network | |
netD_lr | learning rate of Discriminator network | |
num_epochs | training epochs | 12 |
print_epoch | sampling the output of the model every print_epoch epochs | 1 |
save_epoch | aving the result of model every save_epoch epochs | 1 |
Lambda | parameter that controls the weight of gradients_penalty | 10 |
length | sequential length of the training dataset | 50 |
model_name | parameter that controls the saving path under "./checkpoints" | wgan |
seed | random seed, only defined in generate()
|
0 |
params | description | default value | flexible stage |
---|---|---|---|
dataset | path of the training dataset | None | train() |
savepath | path for saving results | None | train() |
sample_model_path | path of the trained model | None | generate() |
sample_number | sampling number scale | None | generate() |
sample_output | path for saving samples | None | generate() |
A demo for model training/sampling is described below:
from gpro.generator.wgan.wgan import WGAN_language
# model training
default_root = "your working directory"
dataset_path = os.path.join(str(default_root),'data/sequence_data.txt')
checkpoint_path = os.path.join(str(default_root), 'checkpoints/wgan/')
model = WGAN_language(length=50)
model.train(dataset=dataset_path, savepath=checkpoint_path)
# model sampling
sample_model_path = os.path.join(str(default_root), 'checkpoints/wgan/checkpoints/net_G_12.pth')
sample_number = 1000
model.generate(sample_model_path, sample_number)
After the training step, you will have a checkpoints and a training_log file under "checkpoint_path/model_name"; when you further perform sampling, you can also get a samples file that contains your samples.
/checkpoints/wgan/model_name
├── checkpoints
│ ├── net_D_i.pth
│ └── net_G_i.pth
├── samples
└── training_log
The detailed information of the file is as follows:
checkpoints: contains net_G_xxx/net_D_xxx, means the params of generator/discriminator
training_log: gen_iter_xx.txt, a fasta file that contains the sample of model output at xx epoch.
samples: a fasta file that contains the final result of model sampling, which might be further used for biological experiments or sequence optimization.
[1] Ye Wang and others, Synthetic promoter design in Escherichia coli based on a deep generative network, Nucleic Acids Research, Volume 48, Issue 12, 09 July 2020, Pages 6403–6412, https://doi.org/10.1093/nar/gkaa325