4.3.5 Feedback - WangLabTHU/GPro GitHub Wiki
hcwang and qxdu edited on Aug 4, 2023, 1 version
The Feedback strategy was first proposed in the Feedback GAN[1], which continuously replaces the training set with newly generated sequences that can be predicted to be highly expressed. So far, this remains an important algorithm in the field of adaptive machine learning. The accuracy of this method has been fully verified. The following figure shows its workflow diagram:
We provide a simplified algorithm for both WGAN and Diffusion.
params | description | default value |
---|---|---|
generator | generator model class | None |
predictor | predictor model class | None |
predictor_modelpath | trained model path of predictor | None |
natural_datapath | natural sequences datapath | None |
sample_number | default sampling scale at each epoch | 1000 |
savepath | final results saving directory | None |
params | description | default value |
---|---|---|
MaxEpoch | sample_number will be replicated for MaxEpoch times | 50 |
MaxPoolsize | length of final selecting results | 1000 |
MaxIter | the feedback steps will be replicated for MaxIter times | 20 |
Before executing optimizer, you should have trained a generator and a predictor.
A simple demo will work like:
from gpro.optimizer.model_driven.feedback import Feedback
# (1) define the generator
from gpro.generator.diffusion.diffusion import Diffusion_language
default_root = "your working directory"
generator = Diffusion_language(length=50)
# (2) define the predictor
from gpro.predictor.cnn_k15.cnnk15 import CNN_K15_language
predictor = CNN_K15_language(length=50)
predictor_modelpath = os.path.join(default_root), 'checkpoints/cnn_k15/checkpoint.pth')
# (3) select the highly-expressed sequence
natural_datapath = default_root + '/data/diffusion_prediction/seq.txt'
tmp = Feedback(generator=generator, predictor=predictor,
predictor_modelpath=predictor_modelpath, sample_number=1000,
natural_datapath=natural_datapath, savepath="./optimization/Feedback")
tmp.run()
This program means you will get top sample_number sequences generated by diffusion model, selected by CNN K15 with its predicted expression values.
Resulting files consists of compared_with_natural.pdf
, ExpIter.txt
, ExpIter.csv
, /checkpoints
, /plot
, traj
and linechart.png
files | description |
---|---|
compared_with_natural.pdf | Box plot comparing model generated results with natural results |
ExpIter.txt | Save the FASTA file for the final result sequence |
ExpIter.csv | Save the sequences and predictions for the final result sequence |
checkpoints | Folder contains the checkpoint of retraining steps |
plot | plot the histogram of natural and model-driven results for comparison |
traj | save the sampling results at each retraining step |
linechart.png | plot the linechart of mean predicted expressions |
A box plot is shown below:
The transformation process of the histogram during the feedback-training process is as follows:

A linechart is shown below:

[1] Gupta, A., Zou, J. Feedback GAN for DNA optimizes protein functions. *Nat Mach Intell* **1**, 105–111 (2019).