4.4.5 WebLogo - WangLabTHU/GPro GitHub Wiki

hcwang and qxdu edited on Aug 4, 2023, 1 version

Introduction

Seqlogo is an enhanced version of saliencymap, allowing you to gain a clearer understanding of the predictor's base preference for each site in the sequence. We also provide a function to directly draw seqlogos on a sequence set under utils/base.py

Parameters

Caution: Note that the format of the sequence and expression files here should be consistent with the QuickStart section.

params description default value
predictor the trained predictor class
predictor_modelpath the pretrained model checkpoint, should be "x/xxx.pth" format
predictor_training_datapath path of natural sequences, training set for predictor will be the best
predictor_expression_datapath path of corresponding expression level with predictor_seqpath
report_path saving folder
file_tag saving name
num_seqs_to_test sampling scales for frequency comparison 200
plot_mode to_type mode of logomaker "saliency"

Demo

from gpro.evaluator.seqlogo import plot_seqlogos

project_path = "your project path"
predictor_training_datapath = os.path.join(project_path,'data/diffusion_promoter/sequence_data.txt')

from gpro.predictor.cnn_k15.cnn_k15 import CNN_K15_language
predictor = CNN_K15_language(length=50)
predictor_modelpath = os.path.join(project_path, 'checkpoints/cnn_k15/' + 'checkpoint.pth')

plot_seqlogos(predictor, predictor_training_datapath, predictor_modelpath, 
                    report_path="./results/", file_tag="CNNK15")

A direct version for simple sequences analysis is as below:

from gpro.utils.base import plot_weblogos

project_path = "your project path"
seqs_datapath = project_path + '/data/diffusion_prediction/seq.txt'
seqs = open_fa(seqs_datapath)
plot_weblogos("./results/test.png", seqs)

You will see the following output:

Results are highly consistent with https://weblogo.berkeley.edu/logo.cgi

The final result will be saved in the ./results directory.

⚠️ **GitHub.com Fallback** ⚠️