4.4.2 Mutagenesis - WangLabTHU/GPro GitHub Wiki
hcwang and qxdu edited on Aug 4, 2023, 1 version
mutagenesis.py performs saturation mutations on each site in a sequence set, predicts the newly obtained saturation mutation set using a predictor, calculates the standard deviation of each site for all obtained mutations, and obtains the weight of each position in the sequence under the current predictor.
Caution: Note that the format of the sequence and expression files here should be consistent with the QuickStart section.
params | description | default value |
---|---|---|
predictor | the trained predictor class | |
predictor_modelpath | the pretrained model checkpoint, should be "x/xxx.pth" format | |
predictor_training_datapath | path of natural sequences, training set for predictor will be the best | |
predictor_expression_datapath | path of corresponding expression level with predictor_seqpath
|
|
report_path | saving folder | |
file_tag | saving name | |
num_seqs_to_test | sampling scales for frequency comparison | 200 |
from gpro.evaluator.mutagenesis import plot_mutagenesis
project_path = "your project path"
predictor_training_datapath = project_path + '/data/diffusion_prediction/seq.txt'
predictor_expression_datapath = project_path + '/data/diffusion_prediction/exp.txt'
from gpro.predictor.cnn_k15.cnn_k15 import CNN_K15_language
predictor = CNN_K15_language(length=50)
predictor_modelpath = os.path.join(project_path, 'checkpoints/cnn_k15/' + 'checkpoint.pth')
plot_mutagenesis(predictor, predictor_modelpath, predictor_training_datapath, predictor_expression_datapath,
report_path="./results/", file_tag="CNNK15")
The final result will be saved in the ./results
directory.