2023.02.24.6GJ8 - tbalius/teb_docking_test_sets GitHub Wiki

DOCK 6 tutorial using KRAS in complex with BI-2852 (PDB ID 6GJ8)

By Y. Stanley Tan, Mayukh Chakrabarti, and Trent E. Balius

Based on 2022 Rizzo lab tutorial.

Getting started

Command line

This tutorial expects a basic understanding of Unix.

Here is a Unix tutorial.

Chimera

This tutorial will use UCSF Chimera to manually prepare files for docking and to visualize dock results.

To show Chimera's command line, select Tools -> General Controls -> Command Line in the menu bar.

Files can be opened in Chimera by selecting File -> Open... in the menu bar at the top. Structures can be saved by selecting File -> Save Mol2... in the menu bar to save as a Mol2 file, or by selecting File -> Save PDB... to save as a PDB file. Save a Chimera session by selecting File -> Save Session As... in the menu bar. This will create a Python script that restores open models and any modifications made at the time of saving. To close the current session, select File -> Close Session in the menu bar.

DOCK results should be visualized using Chimera's ViewDock. To open an output Mol2 file with ViewDock, select Tools -> Surface/Binding Analysis -> ViewDock in the menu bar and select the file in the Open Dock Results window that opens. In the Type Selection dialog box, select Dock 4, 5 or 6 for DOCK 6 results. The ViewDock tool allows the poses stored in the Mol2 file to be viewed one at a time instead of all at once. Select Column -> Show in the menu bar of the ViewDock window and select a value to display it in the table.

Here is a Chimera tutorial.

DOCK

Here is the UCSF DOCK website.

Organizing file directories

Make a directory to house the files for this tutorial:

>> mkdir 6GJ8_tutorial

Inside the 6GJ8_tutorial directory, make the following subdirectories:

001_structure
002_surface_spheres
003_gridbox
004_dock
005_enrichment

Dock Prep

Download the 6GJ8 PDB file from the RCSB Protein Data Bank by going to https://rcsb.org/structure/6GJ8 and selecting "PDB Format" in the Download Files drop-down menu in the top right. Save the file in the 001_structure directory. Alternatively, running the following command will download the file to the working directory:

>> wget https://files.rcsb.org/download/6GJ8.pdb --no-check-certificate

Prepare the receptor and cofactors

Open 6GJ8.pdb in Chimera.

Isolate the receptor and cofactors

Select the ligand by typing select #0: F0K in the command line. The selected atoms should be highlighted in green. Delete the selected atoms by selecting Actions -> Atoms/Bonds -> delete in the menu bar or by typing delete selected in the command line.

Change GCP to GTP

Select carbon C3B of the GCP cofactor. Change the carbon to an oxygen by first selecting Tools -> Structure Editing -> Build Structure in the menu bar. In the "Build Structure" dialog box that opens, change the section from "Start Structure" to "Modify Structure" using the selection menu near the top. After "Modify Structure" is selected, under "Change selected atoms to...", change the element to oxygen, change the number of bonds to 2, and set the atom name to O3B. Under "Residue Name", select "Change the modified residue's name to" and replace UNK with GTP. Apply the changes and close the dialog box.

Add hydrogens and charges

Chimera's AddH protonates the GTP cofactor incorrectly (this is the case for many RAS structures), the structure must be corrected before adding charges. Open Dock Prep by first selecting Surface/Binding Analysis -> Dock Prep in the menu bar. Deselect the "Add charges" and "Write Mol2 file" options so that Chimera stops after adding hydrogens, and does not add charges. Make sure that the "Delete solvent", "If alternate locations, keep only highest occupancy" (delete alternate side chains), and "Incomplete side chains: Replace using Dunbrack 2010 rotamer library" (mutate incomplete side chains) options are selected in addition to "Add hydrogens" so that all the necessary Dock Prep steps are run. In the "Add Hydrogens for Dock Prep" dialog box that opens, select "Unspecified (determined by method)" and click OK. Select the incorrectly protonated hydrogen (H3) and delete it. Select the nitrogen that should be protonated (N1) and open Build Structure. Switch to "Modify Structure", change the element to nitrogen, change the number of bonds to 3, and apply. The newly added hydrogen is named incorrectly, select H1 and reopen Build Structure. Switch to "Modify Structure", change the element to hydrogen, set the number of bonds to 1, set the atom name to HN1, and apply.

Now, finish running Dock Prep by adding charges and saving the structure. Select Surface/Binding Analysis -> Dock Prep in the menu bar and select the "Add charges" and "Write Mol2 file" steps before rerunning Dock Prep. Save the structure as kras_cof.mol2 and close the session.

Prepare the ligand

Reopen 6GJ8.pdb.

Isolate the ligand

Select the ligand by typing select #0: F0K in the command line. Invert the selection by typing select invert in the command line. Delete the selected receptor, cofactors, and water molecules, leaving only the ligand.

Run Dock Prep

Add hydrogens and partial charges by first selecting Tools -> Surface/Binding Analysis -> Dock Prep in the menu bar and clicking OK in the "Dock Prep" dialog box that opens. Select "Unspecified (determined by method)" in the "Add Hydrogens for Dock Prep" dialog box that opens and click OK to add hydrogens. Visually inspect the protonation of the ligand and make sure that it is correct. Leave the default selections and click OK in the "Assign Charges for Dock Prep" and the "Specify Net Charges" dialog boxes to assign partial charges. The ligand should have a net charge of +1.

Save as bi2852.mol2 and close the session.

Surface spheres

Generating DMS

Open kras_cof.mol2.

Run the surface calculation by selecting Actions -> Surface -> Show in the menu bar or by typing surf #0 in the command line.

Save the surface as a DMS file by selecting Tools -> Structure Editing -> Write DMS in the menu bar. Save as kras_cof.dms.

Creating surface spheres

Change directories to the 002_surface_spheres directory:

>> cd ../002_surface_spheres

Create a file called INSPH and format the file as follows:

../001_structure/kras_cof.dms
R
X
0.0
4.0
1.4
6GJ8.sph

This file will be the input file for the sphere generation program, sphgen. The first line of the file specifies the input DMS file. In the second line, R indicates that spheres should be generated outside the surface. In the third line, X indicates that all surface points should be used. The fourth line specifies the acceptable steric clash in angstroms, which, in this case, is set to 0.0 or no steric clash allowed. The fifth and sixth lines specify the maximum and minimum sphere radii in angstroms, respectively. The last line specifies the output sphere file name.

Run sphgen to generate spheres based on the parameters given in the INSPH file:

>> ${dock6_path}/bin/sphgen -i INSPH -o OUTSPH

  • ${dock6_path} is a variable (placeholder) for the DOCK 6 installation path (e.g. /home/baliuste/zzz.github/dock6)

The surface spheres generated by sphgen can be visualized in Chimera. In a Chimera session with kras_cof.mol2 open and the surface shown, open the sphere file (6GJ8.sph) created by sphgen to visualize the surface spheres.

Selecting relevant spheres

To select spheres based on the distance from the ligand, run the following command:

>> ${dock6_path}/bin/sphere_selector 6GJ8.sph ../001_structure/bi2852.mol2 3.0

The first two arguments passed to sphere_selector specify the surface sphere and ligand files, respectively. The last argument, 3.0, specifies that only spheres within 3.0 angstroms of the ligand should be selected.

The selected_spheres.sph file generated by sphere_selector can be visualized in Chimera.

Generating box/grid

Change directories to the 003_gridbox directory.

Box generation

Create the input file for box generation called showbox.in and format the file as follows:

Y
10.0
../002_surface_spheres/selected_spheres.sph
1
6GJ8.box.pdb

In the first line of the file, Y (Yes) indicates that showbox should create a box. The second line specifies the distance, in angstroms, between the spheres and the box borders. The third line specifies the input sphere file. The fourth line indicates the cluster number. The last line specifies the output box file name.

Generate the box based on the parameters given in showbox.in by running showbox as follows:

>> ${dock6_path}/bin/showbox < showbox.in

The 6GJ8.box.pdb file generated by showbox can be visualized in Chimera. The generated box is used to define the boundaries of the energetic grids.

Grid generation

Create the input file for grid generation called grid.in and format it as follows:

compute_grids                             yes
grid_spacing                              0.4
output_molecule                           no
contact_score                             no
energy_score                              yes
energy_cutoff_distance                    9999
atom_model                                a
attractive_exponent                       6
repulsive_exponent                        9
distance_dielectric                       yes
dielectric_factor                         4
allow_non_integral_charges                no
bump_filter                               yes
bump_overlap                              0.75
receptor_file                             ../001_structure/kras_cof.mol2
box_file                                  6GJ8.box.pdb
vdw_definition_file                       ${dock6_path}/parameters/vdw_AMBER_parm99.defn
score_grid_prefix                         grid

Generate grids by running:

>> ${dock6_path}/bin/grid -i grid.in -o grid.out

Docking and auxiliary calculations

Change directories to the 004_dock directory.

Single point calculation

This calculation performs an energy evaluation on the crystallographic ligand conformation without any movement.

Create the input file for the single point calculation called single_point.in and format it as follows:

conformer_search_type                                        rigid
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
ligand_atom_file                                             ../001_structure/bi2852.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       yes
rmsd_reference_filename                                      ../001_structure/bi2852.mol2
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           yes
grid_score_secondary                                         no
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       ../003_gridbox/grid
gist_score_secondary                                         no
multigrid_score_secondary                                    no
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
pharmacophore_score_secondary                                no
hbond_score_secondary                                        no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_score_secondary                                         no
amber_score_secondary                                        no
minimize_ligand                                              no
atom_model                                                   all
vdw_defn_file                                                ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               ${dock6_path}/parameters/flex.defn
flex_drive_file                                              ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix                                        single_point.out
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

To run the single point calculation:

>> ${dock6_path}/bin/dock6 -i single_point.in -o single_point.out

Energy minimization

This calculation performs a minimization run on the crystallographic ligand conformation, allowing the ligand to subtly adjust its conformation (internal degrees of freedom) and orientation (3 translational and 3 orientational degrees of freedom).

Create the input file for energy minimization called min.in and format it as follows:

conformer_search_type                                        rigid
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
ligand_atom_file                                             ../001_structure/bi2852.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       yes
rmsd_reference_filename                                      ../001_structure/bi2852.mol2
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           yes
grid_score_secondary                                         no
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       ../003_gridbox/grid
gist_score_secondary                                         no
multigrid_score_secondary                                    no
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
pharmacophore_score_secondary                                no
hbond_score_secondary                                        no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_score_secondary                                         no
amber_score_secondary                                        no
minimize_ligand                                              yes
simplex_max_iterations                                       1000
simplex_tors_premin_iterations                               0
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_random_seed                                          0
simplex_restraint_min                                        yes
simplex_coefficient_restraint                                10.0
atom_model                                                   all
vdw_defn_file                                                ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               ${dock6_path}/parameters/flex.defn
flex_drive_file                                              ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix                                        6GJ8.ligand.min
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

Run energy minimization:

>> ${dock6_path}/bin/dock6 -i min.in -o min.out

To view the RMSD of the minimized ligand conformation, head the generated Mol2 file:

>> head 6GJ8.ligand.min_scored.mol2

The minimized ligand conformation can be visualized in Chimera by opening the output Mol2 file with ViewDock.

Rigid docking

To test orienting without considering ligand flexibility, we perform rigid docking. We use the crystallographic ligand conformation for this test. This is primarily used for testing purposes only, not for screening.

Create the input file for rigid docking called rigid.in and format it as follows:

conformer_search_type                                        rigid
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100
ligand_atom_file                                             6GJ8.ligand.min_scored.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       yes
rmsd_reference_filename                                      6GJ8.ligand.min_scored.mol2
use_database_filter                                          no
orient_ligand                                                yes
automated_matching                                           yes
receptor_site_file                                           ../002_surface_spheres/selected_spheres.sph
max_orientations                                             2000
critical_points                                              no
chemical_matching                                            no
use_ligand_spheres                                           no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           yes
grid_score_secondary                                         no
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       ../003_gridbox/grid
gist_score_secondary                                         no
multigrid_score_secondary                                    no
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
pharmacophore_score_secondary                                no
hbond_score_secondary                                        no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_score_secondary                                         no
amber_score_secondary                                        no
minimize_ligand                                              yes
simplex_max_iterations                                       1000
simplex_tors_premin_iterations                               0
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_random_seed                                          0
simplex_restraint_min                                        no
atom_model                                                   all
vdw_defn_file                                                ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               ${dock6_path}/parameters/flex.defn
flex_drive_file                                              ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix                                        rigid.out
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

Run rigid docking:

>> ${dock6_path}/bin/dock6 -i rigid.in -o rigid.out

To view the RMSD of the docked ligand conformation, head the generated Mol2 file:

>> head rigid.out_scored.mol2

Fixed anchor docking

To test the ligand flexibility algorithm without considering orienting, we perform fixed anchor docking. We use the crystallographic ligand placement for this test. This is primarily used for testing purposes only, not for screening, as a ligand in a database will be centered elsewhere, not in the site.

Create the input file for fixed anchor docking called fixed.in and format it as follows:

conformer_search_type                                        flex
write_fragment_libraries                                     no
user_specified_anchor                                        no
limit_max_anchors                                            no
min_anchor_size                                              5
pruning_use_clustering                                       yes
pruning_max_orients                                          10000
pruning_clustering_cutoff                                    100
pruning_conformer_score_cutoff                               100.0
pruning_conformer_score_scaling_factor                       1.0
use_clash_overlap                                            no
write_growth_tree                                            no
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
ligand_atom_file                                             ../001_structure/bi2852.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       yes
rmsd_reference_filename                                      ../001_structure/bi2852.mol2
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           yes
grid_score_secondary                                         no
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       ../003_gridbox/grid
gist_score_secondary                                         no
multigrid_score_secondary                                    no
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
pharmacophore_score_secondary                                no
hbond_score_secondary                                        no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_score_secondary                                         no
amber_score_secondary                                        no
minimize_ligand                                              yes
minimize_anchor                                              yes
minimize_flexible_growth                                     yes
use_advanced_simplex_parameters                              no
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1
simplex_trans_step                                           1
simplex_rot_step                                             0.1
simplex_tors_step                                            10
simplex_anchor_max_iterations                                500
simplex_grow_max_iterations                                  500
simplex_grow_tors_premin_iterations                          0
simplex_random_seed                                          0
simplex_restraint_min                                        no
atom_model                                                   all
vdw_defn_file                                                ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               ${dock6_path}/parameters/flex.defn
flex_drive_file                                              ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix                                        6GJ8_fad
write_orientations                                           no
num_scored_conformers                                        100
write_conformations                                          no
cluster_conformations                                        yes
cluster_rmsd_threshold                                       2.0
rank_ligands                                                 no

Run fixed anchor docking:

>> ${dock6_path}/bin/dock6 -i fixed.in -o fixed.out

To view the RMSD of the top scoring docked ligand conformation, head the generated Mol2 file:

>> head 6GJ8_fad_scored.mol2

Flexible docking

This test combines orienting and ligand flexibility, and is the most applicable to production runs of dock.

Create the input file for flexible docking called flex.in and format it as follows:

conformer_search_type                                        flex
write_fragment_libraries                                     no
user_specified_anchor                                        no
limit_max_anchors                                            no
min_anchor_size                                              5
pruning_use_clustering                                       yes
pruning_max_orients                                          5000
pruning_clustering_cutoff                                    2500
pruning_conformer_score_cutoff                               100.0
pruning_conformer_score_scaling_factor                       1.0
use_clash_overlap                                            no
write_growth_tree                                            no
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
ligand_atom_file                                             6GJ8.ligand.min_scored.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       yes
rmsd_reference_filename                                      6GJ8.ligand.min_scored.mol2
use_database_filter                                          no
orient_ligand                                                yes
automated_matching                                           yes
receptor_site_file                                           ../002_surface_spheres/selected_spheres.sph
max_orientations                                             1000
critical_points                                              no
chemical_matching                                            no
use_ligand_spheres                                           no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           yes
grid_score_secondary                                         no
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       ../003_gridbox/grid
gist_score_secondary                                         no
multigrid_score_secondary                                    no
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
pharmacophore_score_secondary                                no
hbond_score_secondary                                        no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_score_secondary                                         no
amber_score_secondary                                        no
minimize_ligand                                              yes
minimize_anchor                                              yes
minimize_flexible_growth                                     yes
use_advanced_simplex_parameters                              no
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_anchor_max_iterations                                500
simplex_grow_max_iterations                                  500
simplex_grow_tors_premin_iterations                          0
simplex_random_seed                                          0
simplex_restraint_min                                        no
atom_model                                                   all
vdw_defn_file                                                ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               ${dock6_path}/parameters/flex.defn
flex_drive_file                                              ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix                                        flex.out
write_orientations                                           no
num_scored_conformers                                        25
write_conformations                                          no
cluster_conformations                                        yes
cluster_rmsd_threshold                                       2
rank_ligands                                                 no

Run flexible docking:

>> ${dock6_path}/bin/dock6 -i flex.in -o flex.out

To view the RMSD of the top scoring docked ligand conformation, head the generated Mol2 file:

>> head flex.out_scored.mol2

Docking to dimer

See 2023.05.08.6GJ8_dimer.

Enrichment with HDB

In enrichment calculations, a set of known binders, actives, and a set of presumed non-binders, decoys, are docked to test the ability of the docking software to distinguish between actives and decoys. To quantify enrichment, this tutorial uses the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. In this section, an example enrichment calculation is performed using DOCK 6 and a sample list of 5 ligands.

Change directories to the 005_enrichment directory. Make a subdirectory called databases.

>> mkdir databases

For this tutorial, we will use 5 ligands from the PDB that bind in the Switch I/II (SOS) Pocket, including the BI-2852 molecule.

In the databases directory, create a ligand SMILES file called actives.smi and format it as follows:

Cn1cc(nc1)Cn2ccc3c2cc(cc3)CNCc4c(c5ccccc5[nH]4)[C@@H]6c7cc(ccc7C(=O)N6)O F0K
C[NH+](C)Cc1ccc(cc1)Nc2ccc(c(c2OC)F)c3cccc(c3)OC F8Q
c1cc2c(cc1Cl)OC[C@H](O2)CN 9R5
c1ccc(c(c1)C(=S)N2CCCC2)O 0QW
CN(C)Cc1ccc(cc1)Nc2ccc(nc2OC)c3cccc4c3OCCO4 D1W

Each line gives a ligand SMILES string and identifier, separated by whitespace. The PDB can be searched with the three-character ligand identifier to view the ligand summary page and PDB entries in which the ligand appears. The BI-2852 molecule has the id F0K.

The tldr web server will be used to build 3D representations of the ligands and generate a background of property-matched decoys for enrichment. Go to https://tldr.docking.org and create an account by selecting Welcome and Sign up in the top right corner of the page. Once your account is created, log in by selecting Welcome and Login in the top right. Once logged in, select Start in the menu to view the modules.

Build ligand database using TLDR

The build3d38 module will prepare molecules for docking from SMILES in four file formats: SDF, PDB, Mol2, and DB2. The DB2 file format will be used for enrichment. Select the build3d38 module from the tldr start page.

Upload the ligand SMILES file (actives.smi) by selecting the Choose File button, or directly paste the ligand SMILES into the input text box. Optionally, enter a memo in the memo text box for reference. Hit Go! to submit the job.

When the job is finished, you will receive a notification email. Refresh the job page to view the results. If the job page was closed, view all completed jobs by selecting Results -> All in the menu.

Download the zipped result file, build3d38_results.zip, under Result Files. Move the file to 005_enrichment/databases and rename it to ligands.zip. Extract the contents of the ligands.zip file and rename the unzipped folder.

>> mv build3d38_results.zip ligands.zip

>> unzip ligands.zip

>> mv *_build3d38 ligands

Extract the 1.tar.gz archive file inside the ligands directory.

>> tar -xvf 1.tar.gz

This should generate another folder containing the ligand SDF, PDB, Mol2, and DB2 files.

Generate property-matched decoys using TLDR (DUDE-Z)

The DUDEZ module will assemble and build property-matched decoys in the DB2 file format.

The decoys are property-matched to the actives by the following molecular properties: formal charge, molecular weight, calculated water-octanol partition coefficient (cLogP), rotatable bonds, number of H-bond donors, and number of H-bond acceptors. Decoys are checked for similarity to the actives and similarity to each other by calculating the Tanimoto coefficient, molecules that are too similar to the actives are filtered and molecules that are similar to each other are clustered.

Select the DUDEZ module from the tldr start page.

Upload the ligand SMILES file (actives.smi) or paste the ligand SMILES as input. Optionally, enter a memo for reference. Hit Go! to submit the job.

Once the job finishes running, download the results.tar.gz file under Result Files and move it to 005_enrichment/databases. Extract the archive file contents.

>> tar -xvf results.tar.gz

This should create a folder called newdecoys. Inside the newdecoys folder should be a file called ligand.smi containing the ligand SMILES and identifiers, a file called decoys.smi containing the decoy SMILES and ZINC IDs, and a folder called decoys containing the decoy DB2 files from ZINC. There should be 50 property-matched decoys assigned to each ligand protomer. Since there are two protomers of the ligand F0K, there should be a total of 300 decoys. Make a copy of the decoys.smi file in the databases directory.

>> cp newdecoys/decoys.smi .

Generate property-matched decoys using the DUD-E website and build using TLDR

An alternate way to generate decoys is to use the DUD-E website.

Generate property-matched decoy SMILES using the DUD-E website

Go to https://dude.docking.org/generate and follow the instructions to generate DUD-E decoys.

When the job is finished, you will receive an email with a link to download the results. Click the link and move the downloaded dude-decoys.tar.gz archive file to 005_enrichment/databases. Extract the file contents.

>> tar -xvf dude-decoys.tar.gz

This should create a folder called dude-decoys containing a README file, a ligands.charge file, and a folder called decoys containing files with the picked decoys smiles for each ligand. There should be one file for each of the ligand protomers, and each file should contain the ligand SMILES as the first line and 50 property-matched decoy SMILES. Since there are two protomers of the ligand F0K, there should be a total of 6 decoy files. Move into dude-decoys/decoys and run the following command:

>> cat decoys.*.picked | grep -v ligand | awk '{print $1,$2}' > ../../decoys.smi

This concatenates all the decoy SMILES files, omitting the ligand SMILES lines, and prints the first and second columns (the decoy smiles and an identifier, respectively) separated by whitespace. The output is saved to a file called decoys.smi in the 005_enrichment directory. The decoys.smi file should contain 300 decoys SMILES.

Build decoy DB2 files from SMILES using TLDR

Now build the decoy DB2 files from SMILES using the tldr build3d38 module as was done for the ligands. Select the build3d38 module from the tldr start page. Upload the decoy SMILES file (decoys.smi) as input. Optionally, enter a memo for reference. Hit Go! to submit the job. When the job finishes, download the build3d38_results.zip file. Move the file to 005_enrichment/databases and rename it to decoys.zip. Extract the contents of the decoys.zip file and rename the created folder.

>> mv build3d38_results.zip decoys.zip

>> unzip decoys.zip

>> mv *_build3d38 decoys

Move inside the decoys directory and extract all the archive files inside. There should be a total of six files.

>> tar -xvf 1.tar.gz ... >> tar -xvf 6.tar.gz

Enrichment with Grid Score

Run DOCK

In the 005_enrichment directory, create a shell script called run_dock6_gridscore.csh and format it as follows:

# set directories
set pwd = `pwd`
set filedir = "${pwd}/databases"
set dockfilesdir = "${pwd}/.."
set workdir = "${pwd}/6GJ8_docking_grid"

if (-e $workdir ) then
  echo "$workdir exists."
  exit
endif

mkdir $workdir
cd $workdir

ls -l ${filedir}/actives.smi ${filedir}/decoys.smi

# generate sdi.txt
rm -f sdi.txt
touch sdi.txt

# for ligands
foreach id (`cat ${filedir}/actives.smi | awk '{print $2}'`)
  echo $id
  ls ${filedir}/ligands/*/${id}*.db2.gz >> sdi.txt # CHANGE THIS
end

## for DUD-E decoys
#foreach id (`cat ${filedir}/decoys.smi | awk '{print $2}'`)
#  echo $id
#  ls ${filedir}/decoys/*/${id}*.db2.gz >> sdi.txt # CHANGE THIS
#end

## for DUDE-Z (TLDR) decoys
#foreach id (`cat ${filedir}/decoys.smi | awk '{print $2}'`)
#  echo $id
#  ls ${filedir}/newdecoys/decoys/${id}*.db2.gz >> sdi.txt # CHANGE THIS
#end

# generate dock.in
cat << EOF > dock.in
conformer_search_type                                        HDB
num_per_search                                               1
skip_broken                                                  yes
hdb_db2_input_file                                           sdi.txt
hdb_db2_search_score_threshold                               100.0
use_internal_energy                                          no
use_database_filter                                          no
orient_ligand                                                yes
automated_matching                                           no
automated_matching_iteration                                 yes
distance_tolerance                                           0.2
distance_minimum                                             0.0
nodes_minimum                                                4
nodes_maximum                                                4
receptor_site_file                                           dockfiles/002_surface_spheres/selected_spheres.sph
max_orientations                                             1000
critical_points                                              no
chemical_matching                                            no
use_ligand_spheres                                           no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
grid_score_primary                                           yes
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       dockfiles/003_gridbox/grid
minimize_ligand                                              yes
minimize_anchor                                              no
minimize_flexible_growth                                     no
use_advanced_simplex_parameters                              no
simplex_max_iterations                                       500
simplex_tors_premin_iterations                               0
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_random_seed                                          0
simplex_restraint_min                                        no
atom_model                                                   all
vdw_defn_file                                                ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               ${dock6_path}/parameters/flex.defn
flex_drive_file                                              ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix                                        output
write_mol_solvation                                          no
write_orientations                                           no
num_scored_conformers                                        1
score_threshold                                              100.0
rank_ligands                                                 no
EOF

ln -s ${dockfilesdir} dockfiles

${dock6_path}/bin/dock6 -i dock.in -o dock.out -v

This script saves a list of the ligand and decoy DB2 files in a file called sdi.txt, creates a DOCK input file called dock.in, and then runs DOCK using the DB2 files listed in sdi.txt and the parameters given in dock.in. Depending on which decoy generation method you used, uncomment the correct code block under the "generate sdi.txt" section in the script so that the decoy DB2 files are written to sdi.txt. Modify the script so that the DB2 files are correctly listed to sdi.txt.

Run the script:

>> csh run_dock6_gridscore.csh

Enrichment calculation

Change directories to the created 6GJ8_docking_grid directory. Create a python script called make_extract_all_file_for_all_dock_out.py and format it as follows:

import sys, os


def open_dock_out(dict_zinc,filename,score_txt):

    fh = open(filename,'r')

    name = ''
    score = 100000.0


    for line in fh:
        splitline = line.split()
        if len(splitline) < 2:
           continue
        if splitline[0] == "Molecule:" :
           name = splitline[1]
        if splitline[0] == score_txt:
           score = float(splitline[1])
           if name in dict_zinc:
              if (dict_zinc[name]>score):
                  dict_zinc[name] = score
           else:
              dict_zinc[name] = score
           print (name, score)
    fh.close()

def mySortFunc(e):
  return e[0]

def write_extract(dict_zinc,file1,file2):

    fho       = open(file1,'w')
    data = []
    for key in dict_zinc:
           name = key
           score = dict_zinc[name]
           ostring = "../AB/ABaaaaaa/t00003/ABaaaaaa0025  27991 %16s 1                    2702       2702    0.10  11         1      1612      1     1     0.00    0.00    0.00   0.00    0.00    0.00    0.00    0.00    0.00    %6.2f\n"%(name, score)
           fho.write(ostring)
           data.append([score,name])
    fho.close()
    data.sort(key=mySortFunc)
    fho       = open(file2,'w')
    for ele in data:
           name = ele[1]
           score = ele[0]
           ostring = "../AB/ABaaaaaa/t00003/ABaaaaaa0025  27991 %16s 1                    2702       2702    0.10  11         1      1612      1     1     0.00    0.00    0.00   0.00    0.00    0.00    0.00    0.00    0.00    %6.2f\n"%(name, score)
           fho.write(ostring)
    fho.close()

dict_zinc1 = {}
dirpath1   = sys.argv[1]
score_txt1 = sys.argv[2]

if (score_txt1 != "Chemgrid_Score:" or score_txt1 != "Grid_Score:" or score_txt1 != "Descriptor_Score:"):
    print ("score_txt1 must be Chemgrid_Score: or Grid_Score: or Descriptor_Score:")
    exit
filename2 = "extract_all.uniq.txt"
filename3 = "extract_all.sort.uniq.txt"

isdir = os.path.isdir(dirpath1)
if not (isdir):
    print("Error %s is not a dir"%dirpath1)
    exit()

openhandel = os.popen('ls %s/dock.out'%dirpath1)

for line in openhandel:
    filename1 = line.strip()
    open_dock_out(dict_zinc1,filename1,score_txt1)

write_extract(dict_zinc1,filename2,filename3)

This script reads the dock.out files, extracts the names and scores of the docked molecules, and writes the names and scores to two files: extract_all.uniq.txt and extract_all.sort.uniq.txt. In both files, each line contains the name and score of a unique molecule as determined by the name. The molecule names are written to the third column and the corresponding scores are written to the last column. The second file, extract_all.sort.uniq.txt, is sorted by score, with the best scoring (lowest energy) molecule at the top, and the worst scoring (highest energy) molecule at the bottom. The script takes two arguments, the directory path containing the dock.out files and the scoring function.

Run the script:

>> python make_extract_all_file_for_all_dock_out.py . Grid_Score:

Create two files containing lists of the names of the ligands and decoys by running the following commands:

>> awk '{print $2}' ../databases/actives.smi > ligands.name

>> awk '{print $2}' ../databases/decoys.smi > decoys.name

The ROC curve can be plotted with a log scale on the x-axis (% of decoys found or false positive rate) to weight early enrichment more. The area under the curve of this plot is called logAUC. Calculate the AUC and logAUC and plot the ROC curve by calling the following scripts:

>> ${DOCKBASE}/analysis/enrich.py -i . -l ligands.name -d decoys.name

>> ${DOCKBASE}/analysis/plots.py -i . -l ligands.name -d decoys.name

[ these scripts are distributed with DOCK 3.8 ]

Example results

The example enrichment results below were generated using the 5 ligands listed above and a background of DUDE-E decoys.

The AUC and logAUC values are given in the first line of the roc_own.txt file created by enrich.py. Head the roc_own.txt file to view the AUC and logAUC.

>> head -1 roc_own.txt

#AUC    77.60   LogAUC  19.45

The roc_own.png file created by plots.py is the plot of the ROC curve. The x-axis is plotted with a log scale of base 10 so as to emphasize early enrichment.

The output_scored.mol2 file can be visualized in Chimera using ViewDock.

Enrichment with ChemGrid Score

See 2023.05.08.6GJ8_ChemGrid_Score.

⚠️ **GitHub.com Fallback** ⚠️