hcwang and qxdu edited on Aug 4, 2023, 1 version

Introduction

The Feedback strategy was first proposed in the Feedback GAN[1], which continuously replaces the training set with newly generated sequences that can be predicted to be highly expressed. So far, this remains an important algorithm in the field of adaptive machine learning. The accuracy of this method has been fully verified. The following figure shows its workflow diagram:

We provide a simplified algorithm for both WGAN and Diffusion.

Input Parameters

Initialization params

params	description	default value
generator	generator model class	None
predictor	predictor model class	None
predictor_modelpath	trained model path of predictor	None
natural_datapath	natural sequences datapath	None
sample_number	default sampling scale at each epoch	1000
savepath	final results saving directory	None

Running params

params	description	default value
MaxEpoch	sample_number will be replicated for MaxEpoch times	50
MaxPoolsize	length of final selecting results	1000
MaxIter	the feedback steps will be replicated for MaxIter times	20

Demo

Before executing optimizer, you should have trained a generator and a predictor.

A simple demo will work like:

from gpro.optimizer.model_driven.feedback import Feedback

# (1) define the generator
from gpro.generator.diffusion.diffusion import Diffusion_language
default_root = "your working directory"
generator = Diffusion_language(length=50)

# (2) define the predictor
from gpro.predictor.cnn_k15.cnnk15 import CNN_K15_language
predictor = CNN_K15_language(length=50)
predictor_modelpath = os.path.join(default_root), 'checkpoints/cnn_k15/checkpoint.pth')

# (3) select the highly-expressed sequence
natural_datapath = default_root + '/data/diffusion_prediction/seq.txt'

tmp = Feedback(generator=generator, predictor=predictor, 
                   predictor_modelpath=predictor_modelpath, sample_number=1000,
                   natural_datapath=natural_datapath, savepath="./optimization/Feedback")

tmp.run()

This program means you will get top sample_number sequences generated by diffusion model, selected by CNN K15 with its predicted expression values.

Results

Resulting files consists of compared_with_natural.pdf, ExpIter.txt, ExpIter.csv, /checkpoints, /plot, traj and linechart.png

files	description
compared_with_natural.pdf	Box plot comparing model generated results with natural results
ExpIter.txt	Save the FASTA file for the final result sequence
ExpIter.csv	Save the sequences and predictions for the final result sequence
checkpoints	Folder contains the checkpoint of retraining steps
plot	plot the histogram of natural and model-driven results for comparison
traj	save the sampling results at each retraining step
linechart.png	plot the linechart of mean predicted expressions

A box plot is shown below:

The transformation process of the histogram during the feedback-training process is as follows:

A linechart is shown below:

Citations

[1] Gupta, A., Zou, J. Feedback GAN for DNA optimizes protein functions. *Nat Mach Intell* **1**, 105–111 (2019).

4.3.5 Feedback - WangLabTHU/GPro GitHub Wiki

Introduction

Input Parameters

Initialization params

Running params

Demo

Results

Citations

⚠️ GitHub.com Fallback ⚠️

4.3.5 Feedback - WangLabTHU/GPro GitHub Wiki

Introduction

Input Parameters

Initialization params

Running params

Demo

Results

Citations

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️