4.4.2 Mutagenesis - WangLabTHU/GPro GitHub Wiki

hcwang and qxdu edited on Aug 4, 2023, 1 version

Introduction

mutagenesis.py performs saturation mutations on each site in a sequence set, predicts the newly obtained saturation mutation set using a predictor, calculates the standard deviation of each site for all obtained mutations, and obtains the weight of each position in the sequence under the current predictor.

Parameters

Caution: Note that the format of the sequence and expression files here should be consistent with the QuickStart section.

params description default value
predictor the trained predictor class
predictor_modelpath the pretrained model checkpoint, should be "x/xxx.pth" format
predictor_training_datapath path of natural sequences, training set for predictor will be the best
predictor_expression_datapath path of corresponding expression level with predictor_seqpath
report_path saving folder
file_tag saving name
num_seqs_to_test sampling scales for frequency comparison 200

Demo

from gpro.evaluator.mutagenesis import plot_mutagenesis

project_path = "your project path"
predictor_training_datapath  = project_path + '/data/diffusion_prediction/seq.txt'
predictor_expression_datapath = project_path + '/data/diffusion_prediction/exp.txt'

from gpro.predictor.cnn_k15.cnn_k15 import CNN_K15_language
predictor = CNN_K15_language(length=50)
predictor_modelpath = os.path.join(project_path, 'checkpoints/cnn_k15/' + 'checkpoint.pth')

plot_mutagenesis(predictor, predictor_modelpath, predictor_training_datapath, predictor_expression_datapath,
                     report_path="./results/", file_tag="CNNK15")

The final result will be saved in the ./results directory.

⚠️ **GitHub.com Fallback** ⚠️