4.3.2 Genetic - WangLabTHU/GPro GitHub Wiki
hcwang and qxdu edited on Aug 4, 2023, 1 version
The genetic algorithm (GA), developed by John Holland and his collaborators in the 1960s and 1970s, is a model or abstraction of biological evolution based on Charles Darwin's theory of natural selection. Holland was probably the first to use the crossover and recombination, mutation, and selection in the study of adaptive and artificial systems. These genetic operators form the essential part of the genetic algorithm as a problem-solving strategy. Since then, many variants of genetic algorithms have been developed and applied to a wide range of optimization problems, from graph coloring to pattern recognition, from discrete systems (such as the travelling salesman problem) to continuous systems (e.g., the efficient design of airfoil in aerospace engineering), and from financial markets to multi-objective engineering optimization.
Genetic algorithm can be used for implicit space optimization of our WGAN model. The schematic diagram of a workflow is shown below[1].
Caution: The current algorithm defaults to using the WGAN generator and CNNK15 predictor. Please provide the model you have already trained. This program will search for the most effective hidden space
params | description | default value |
---|---|---|
generator_modelpath | trained model path of generator | None |
predictor_modelpath | trained model path of predictor | None |
natural_datapath | natural sequences datapath | None |
sample_number | default sampling scale at each epoch | None |
savepath | final results saving directory | None |
z_dim | dimension of hidden state for WGAN model | 128 |
seq_len | sequence length | 50 |
params | description | default value |
---|---|---|
P_rep | dropping rate of delRep | 0.3 |
P_new | New generation scale | 0.25 |
P_elite | Elite Probability in Evolutionary Algorithms | 0.25 |
MaxIter | Maximum Iteration epoch | 1000 |
MaxPoolsize | length of final selecting results | 2000 |
Before executing optimizer, you should have trained a generator and a predictor.
A simple demo will work like:
from gpro.optimizer.heuristic.genetic import GeneticAlgorithm
# (1) define the generator
default_root = "your working directory"
generator_modelpath = os.path.join(str(default_root), 'checkpoints/wgan/checkpoints/net_G_12.pth')
# (2) define the predictor
predictor_modelpath = os.path.join(default_root), 'checkpoints/cnn_k15/checkpoint.pth')
# (3) select the highly-expressed sequence
natural_datapath = default_root + '/data/diffusion_prediction/seq.txt'
tmp = GeneticAlgorithm(generator_modelpath=generator_modelpath, predictor_modelpath=predictor_modelpath,
natural_datapath=natural_datapath, savepath="./optimization/Genetic")
tmp.run()
Resulting files consists of compared_with_natural.pdf
, each_iter_distribution.pdf
,ExpIter.txt
, ExpIter.csv
files | description |
---|---|
compared_with_natural.pdf | Box plot comparing model generated results with natural results |
each_iter_distribution.pdf | Record a boxplot of the improvement effect every 100 epochs |
ExpIter.txt | Save the FASTA file for the final result sequence |
ExpIter.csv | Save the sequences and predictions for the final result sequence. Store every 100 epochs. |
A box plot for compared_with_natural.pdf
is shown below.
A box plot for each_iter_distribution.pdf
is shown below.
[1] Woodward, Robert & Kelleher, Edmund. (2016). Towards 'smart lasers': Self-optimisation of an ultrafast pulse source using a genetic algorithm. Scientific Reports. 6. 10.1038/srep37616.