Useful scripts - labgem/ASMC GitHub Wiki

Get identity percentage between targets and reference(s)

When modelling is carried out by MODELLER (ASMC default), the percentage of identity between the target sequences and the reference structure(s) is calculated. This information is used to determine the reference(s) that will be used to model and align the target. Active sites are extracted on the basis of the alignment between the targets and their respective reference.

To extract and cluster the active sites from a MSA with multiple reference sequences, it's necessary to first pass the file identity_targets_refs.tsv (generated by the script ASMC/asmc/compute_perc_id.py) to run_asmc.py.

usage: compute_perc_id.py [-h] (-s  | -m ) (-r  | -R )

options:
  -h, --help       show this help message and exit
  -s , --seqs      multi fasta file
  -m , --models    file containing all PDB paths
  -r , --ref-str   file containing the reference structure paths
  -R , --ref-seq   file containing the reference sequences id

The identity percentage can be calculated using either target sequences, if the user has run the ASMC using a set of sequences, or target 3D structures, if the user has run the ASMC using pre-built 3D models (MODELLER, AlphaFold...).

From a set of sequence targets

User must provide a set of homologous protein sequences and a reference sequence file called by the --ref-seq option.

python ASMC/asmc/compute_perc_id.py -s sequences.fasta --ref-seq ref_seq.txt

From a set of structure targets

User must provide a set of homologous protein structures and a reference structure file called by the --ref-str option.

python ASMC/asmc/compute_perc_id.py -r models.txt --ref-str ref_str.txt

Extract amino acid at a queried position

The script ASMC/asmc/extract_aa.py extracts the lines of groups_x_min_y.tsv that contain a specific amino acid or residue type at a queried position.

usage: extract_aa.py [-h] -f  -p  -a  [-g]

options:
  -h, --help        show this help message and exit
  -f , --file       tsv file from run_asmc.py
  -p , --position   position where to find the specified amino acid type, e.g: 5
  -a , --aa-type    amino acid type to search, must be either 1-letter amino acid, 'aromatic', 'acidic', 'basic', 'polar' or 'hydrophobic'
  -g , --group      group id, if not used, search in all groups

The position numbering corresponds to the position in the active site sequences within groups_x_min_y.tsv; e.g, if the user is looking for a tyrosine (Y) at position 5, the command line is as follows:

python ASMC/asmc/extract_aa.py -f groups_x_min_y.tsv -p 5 -a Y

Outputs are displayed in the stdout.

Compare active sites from different methods

The script ASMC/asmc/compare_active_site.py returns the comparison of active sites present within groups_x_min_y.tsv.

usage: compare_active_site.py [-h] -f1  -f2  -id

options:
  -h, --help  show this help message and exit
  -f1         Group file 1
  -f2         Group file 2
  -id         identity_targets_refs.tsv

User must provide a TSV file for each clustering method (MSA, structure, pairwise) and the identity_targets_refs.tsv called by the -id option.

python ASMC/asmc/compare_active_site.py -f1 groups_x_min_y.tsv -f2 groups_a_min_b.tsv -id identity_targets_refs.tsv

The output file is named active_site_checking.tsv.

Retrieve unique active sites and obtain some statistics

The script ASMC/asmc/stats.py returns the unique active sites per group and some statistics.

User must provide the file groups_x_min_y.tsv.

python ASMC/asmc/stats.py groups_x_min_y.tsv

The output files are unique_sequences.tsv and groups_stats.tsv.

Visualisation with Pymol

The script ASMC/asmc/zoom_active_site.py should be used in the Pymol console. It runs some Pymol commands to show the superposed active site residues. To visualise the active sites:

  • Open Pymol and set a directory containing all the ASMC outputs as working directory
  • Use the command run <path>/ASMC/asmc/zoom_active_site.py to load the functions
  • Use the command target ID, where ID is the ID of a built model, to load the target model and his reference structure
  • Use the active_site_pocket.csv to zoom on the two active sites

The last command displays the list of corresponding positions in the Pymol console, e.g:

Ref - Target
189 SER - 94 VAL
190 THR - 95 SER
191 GLY - 96 SER
192 ILE - 97 ILE
193 CYS - 98 CYS
197 SER - 102 ALA
200 LEU - Gap
202 PHE - 103 ALA
235 THR - 131 ASP
268 PRO - 163 PRO
271 GLN - 166 GLN
272 TYR - 167 TYR
275 TYR - Gap
278 GLU - 170 SER

Format groups output to CSV

The script ASMC/asmc/groups_tsv_to_csv transforms a TSV file into a CSV file. In the new file, each position in the sequence is in its own column.

usage: groups_tsv_to_csv.py [-h] -f  [-n]

options:
  -h, --help    show this help message and exit
  -f , --file   Group tsv file returned by run_asmc.py
  -n , --name   output name without extension

User must provide a TSV file.

python /home/tbailly/aladin/ASMC/asmc/groups_tsv_to_csv.py -f groups_x_min_y.tsv 

The output file is a CSV file.

⚠️ **GitHub.com Fallback** ⚠️