2023.02.24.6GJ8 - tbalius/teb_docking_test_sets GitHub Wiki
By Y. Stanley Tan, Mayukh Chakrabarti, and Trent E. Balius
Based on 2022 Rizzo lab tutorial.
This tutorial expects a basic understanding of Unix.
Here is a Unix tutorial.
This tutorial will use UCSF Chimera to manually prepare files for docking and to visualize dock results.
To show Chimera's command line, select Tools -> General Controls -> Command Line in the menu bar.
Files can be opened in Chimera by selecting File -> Open... in the menu bar at the top. Structures can be saved by selecting File -> Save Mol2... in the menu bar to save as a Mol2 file, or by selecting File -> Save PDB... to save as a PDB file. Save a Chimera session by selecting File -> Save Session As... in the menu bar. This will create a Python script that restores open models and any modifications made at the time of saving. To close the current session, select File -> Close Session in the menu bar.
DOCK results should be visualized using Chimera's ViewDock. To open an output Mol2 file with ViewDock, select Tools -> Surface/Binding Analysis -> ViewDock in the menu bar and select the file in the Open Dock Results window that opens. In the Type Selection dialog box, select Dock 4, 5 or 6 for DOCK 6 results. The ViewDock tool allows the poses stored in the Mol2 file to be viewed one at a time instead of all at once. Select Column -> Show in the menu bar of the ViewDock window and select a value to display it in the table.
Here is a Chimera tutorial.
Here is the UCSF DOCK website.
Make a directory to house the files for this tutorial:
>> mkdir 6GJ8_tutorial
Inside the 6GJ8_tutorial directory, make the following subdirectories:
001_structure
002_surface_spheres
003_gridbox
004_dock
005_enrichment
Download the 6GJ8 PDB file from the RCSB Protein Data Bank by going to https://rcsb.org/structure/6GJ8 and selecting "PDB Format" in the Download Files drop-down menu in the top right. Save the file in the 001_structure directory. Alternatively, running the following command will download the file to the working directory:
>> wget https://files.rcsb.org/download/6GJ8.pdb --no-check-certificate
Open 6GJ8.pdb in Chimera.
Isolate the receptor and cofactors
Select the ligand by typing select #0: F0K in the command line. The selected atoms should be highlighted in green. Delete the selected atoms by selecting Actions -> Atoms/Bonds -> delete in the menu bar or by typing delete selected in the command line.
Change GCP to GTP
Select carbon C3B of the GCP cofactor. Change the carbon to an oxygen by first selecting Tools -> Structure Editing -> Build Structure in the menu bar. In the "Build Structure" dialog box that opens, change the section from "Start Structure" to "Modify Structure" using the selection menu near the top. After "Modify Structure" is selected, under "Change selected atoms to...", change the element to oxygen, change the number of bonds to 2, and set the atom name to O3B. Under "Residue Name", select "Change the modified residue's name to" and replace UNK with GTP. Apply the changes and close the dialog box.
Add hydrogens and charges
Chimera's AddH protonates the GTP cofactor incorrectly (this is the case for many RAS structures), the structure must be corrected before adding charges. Open Dock Prep by first selecting Surface/Binding Analysis -> Dock Prep in the menu bar. Deselect the "Add charges" and "Write Mol2 file" options so that Chimera stops after adding hydrogens, and does not add charges. Make sure that the "Delete solvent", "If alternate locations, keep only highest occupancy" (delete alternate side chains), and "Incomplete side chains: Replace using Dunbrack 2010 rotamer library" (mutate incomplete side chains) options are selected in addition to "Add hydrogens" so that all the necessary Dock Prep steps are run. In the "Add Hydrogens for Dock Prep" dialog box that opens, select "Unspecified (determined by method)" and click OK. Select the incorrectly protonated hydrogen (H3) and delete it. Select the nitrogen that should be protonated (N1) and open Build Structure. Switch to "Modify Structure", change the element to nitrogen, change the number of bonds to 3, and apply. The newly added hydrogen is named incorrectly, select H1 and reopen Build Structure. Switch to "Modify Structure", change the element to hydrogen, set the number of bonds to 1, set the atom name to HN1, and apply.
Now, finish running Dock Prep by adding charges and saving the structure. Select Surface/Binding Analysis -> Dock Prep in the menu bar and select the "Add charges" and "Write Mol2 file" steps before rerunning Dock Prep. Save the structure as kras_cof.mol2 and close the session.
Reopen 6GJ8.pdb.
Isolate the ligand
Select the ligand by typing select #0: F0K in the command line. Invert the selection by typing select invert in the command line. Delete the selected receptor, cofactors, and water molecules, leaving only the ligand.
Run Dock Prep
Add hydrogens and partial charges by first selecting Tools -> Surface/Binding Analysis -> Dock Prep in the menu bar and clicking OK in the "Dock Prep" dialog box that opens. Select "Unspecified (determined by method)" in the "Add Hydrogens for Dock Prep" dialog box that opens and click OK to add hydrogens. Visually inspect the protonation of the ligand and make sure that it is correct. Leave the default selections and click OK in the "Assign Charges for Dock Prep" and the "Specify Net Charges" dialog boxes to assign partial charges. The ligand should have a net charge of +1.
Save as bi2852.mol2 and close the session.
Generating DMS
Open kras_cof.mol2.
Run the surface calculation by selecting Actions -> Surface -> Show in the menu bar or by typing surf #0 in the command line.
Save the surface as a DMS file by selecting Tools -> Structure Editing -> Write DMS in the menu bar. Save as kras_cof.dms.
Creating surface spheres
Change directories to the 002_surface_spheres directory:
>> cd ../002_surface_spheres
Create a file called INSPH and format the file as follows:
../001_structure/kras_cof.dms
R
X
0.0
4.0
1.4
6GJ8.sph
This file will be the input file for the sphere generation program, sphgen. The first line of the file specifies the input DMS file. In the second line, R
indicates that spheres should be generated outside the surface. In the third line, X
indicates that all surface points should be used. The fourth line specifies the acceptable steric clash in angstroms, which, in this case, is set to 0.0
or no steric clash allowed. The fifth and sixth lines specify the maximum and minimum sphere radii in angstroms, respectively. The last line specifies the output sphere file name.
Run sphgen to generate spheres based on the parameters given in the INSPH file:
>> ${dock6_path}/bin/sphgen -i INSPH -o OUTSPH
- ${dock6_path} is a variable (placeholder) for the DOCK 6 installation path (e.g. /home/baliuste/zzz.github/dock6)
The surface spheres generated by sphgen can be visualized in Chimera. In a Chimera session with kras_cof.mol2 open and the surface shown, open the sphere file (6GJ8.sph) created by sphgen to visualize the surface spheres.
Selecting relevant spheres
To select spheres based on the distance from the ligand, run the following command:
>> ${dock6_path}/bin/sphere_selector 6GJ8.sph ../001_structure/bi2852.mol2 3.0
The first two arguments passed to sphere_selector specify the surface sphere and ligand files, respectively. The last argument, 3.0
, specifies that only spheres within 3.0 angstroms of the ligand should be selected.
The selected_spheres.sph file generated by sphere_selector can be visualized in Chimera.
Change directories to the 003_gridbox directory.
Box generation
Create the input file for box generation called showbox.in and format the file as follows:
Y
10.0
../002_surface_spheres/selected_spheres.sph
1
6GJ8.box.pdb
In the first line of the file, Y
(Yes) indicates that showbox should create a box. The second line specifies the distance, in angstroms, between the spheres and the box borders. The third line specifies the input sphere file. The fourth line indicates the cluster number. The last line specifies the output box file name.
Generate the box based on the parameters given in showbox.in by running showbox as follows:
>> ${dock6_path}/bin/showbox < showbox.in
The 6GJ8.box.pdb file generated by showbox can be visualized in Chimera. The generated box is used to define the boundaries of the energetic grids.
Grid generation
Create the input file for grid generation called grid.in and format it as follows:
compute_grids yes
grid_spacing 0.4
output_molecule no
contact_score no
energy_score yes
energy_cutoff_distance 9999
atom_model a
attractive_exponent 6
repulsive_exponent 9
distance_dielectric yes
dielectric_factor 4
allow_non_integral_charges no
bump_filter yes
bump_overlap 0.75
receptor_file ../001_structure/kras_cof.mol2
box_file 6GJ8.box.pdb
vdw_definition_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
score_grid_prefix grid
Generate grids by running:
>> ${dock6_path}/bin/grid -i grid.in -o grid.out
Change directories to the 004_dock directory.
This calculation performs an energy evaluation on the crystallographic ligand conformation without any movement.
Create the input file for the single point calculation called single_point.in and format it as follows:
conformer_search_type rigid
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file ../001_structure/bi2852.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename ../001_structure/bi2852.mol2
use_database_filter no
orient_ligand no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix single_point.out
write_orientations no
num_scored_conformers 1
rank_ligands no
To run the single point calculation:
>> ${dock6_path}/bin/dock6 -i single_point.in -o single_point.out
This calculation performs a minimization run on the crystallographic ligand conformation, allowing the ligand to subtly adjust its conformation (internal degrees of freedom) and orientation (3 translational and 3 orientational degrees of freedom).
Create the input file for energy minimization called min.in and format it as follows:
conformer_search_type rigid
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file ../001_structure/bi2852.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename ../001_structure/bi2852.mol2
use_database_filter no
orient_ligand no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
simplex_max_iterations 1000
simplex_tors_premin_iterations 0
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_random_seed 0
simplex_restraint_min yes
simplex_coefficient_restraint 10.0
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix 6GJ8.ligand.min
write_orientations no
num_scored_conformers 1
rank_ligands no
Run energy minimization:
>> ${dock6_path}/bin/dock6 -i min.in -o min.out
To view the RMSD of the minimized ligand conformation, head the generated Mol2 file:
>> head 6GJ8.ligand.min_scored.mol2
The minimized ligand conformation can be visualized in Chimera by opening the output Mol2 file with ViewDock.
To test orienting without considering ligand flexibility, we perform rigid docking. We use the crystallographic ligand conformation for this test. This is primarily used for testing purposes only, not for screening.
Create the input file for rigid docking called rigid.in and format it as follows:
conformer_search_type rigid
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100
ligand_atom_file 6GJ8.ligand.min_scored.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename 6GJ8.ligand.min_scored.mol2
use_database_filter no
orient_ligand yes
automated_matching yes
receptor_site_file ../002_surface_spheres/selected_spheres.sph
max_orientations 2000
critical_points no
chemical_matching no
use_ligand_spheres no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
simplex_max_iterations 1000
simplex_tors_premin_iterations 0
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix rigid.out
write_orientations no
num_scored_conformers 1
rank_ligands no
Run rigid docking:
>> ${dock6_path}/bin/dock6 -i rigid.in -o rigid.out
To view the RMSD of the docked ligand conformation, head the generated Mol2 file:
>> head rigid.out_scored.mol2
To test the ligand flexibility algorithm without considering orienting, we perform fixed anchor docking. We use the crystallographic ligand placement for this test. This is primarily used for testing purposes only, not for screening, as a ligand in a database will be centered elsewhere, not in the site.
Create the input file for fixed anchor docking called fixed.in and format it as follows:
conformer_search_type flex
write_fragment_libraries no
user_specified_anchor no
limit_max_anchors no
min_anchor_size 5
pruning_use_clustering yes
pruning_max_orients 10000
pruning_clustering_cutoff 100
pruning_conformer_score_cutoff 100.0
pruning_conformer_score_scaling_factor 1.0
use_clash_overlap no
write_growth_tree no
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file ../001_structure/bi2852.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename ../001_structure/bi2852.mol2
use_database_filter no
orient_ligand no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
minimize_anchor yes
minimize_flexible_growth yes
use_advanced_simplex_parameters no
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1
simplex_trans_step 1
simplex_rot_step 0.1
simplex_tors_step 10
simplex_anchor_max_iterations 500
simplex_grow_max_iterations 500
simplex_grow_tors_premin_iterations 0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix 6GJ8_fad
write_orientations no
num_scored_conformers 100
write_conformations no
cluster_conformations yes
cluster_rmsd_threshold 2.0
rank_ligands no
Run fixed anchor docking:
>> ${dock6_path}/bin/dock6 -i fixed.in -o fixed.out
To view the RMSD of the top scoring docked ligand conformation, head the generated Mol2 file:
>> head 6GJ8_fad_scored.mol2
This test combines orienting and ligand flexibility, and is the most applicable to production runs of dock.
Create the input file for flexible docking called flex.in and format it as follows:
conformer_search_type flex
write_fragment_libraries no
user_specified_anchor no
limit_max_anchors no
min_anchor_size 5
pruning_use_clustering yes
pruning_max_orients 5000
pruning_clustering_cutoff 2500
pruning_conformer_score_cutoff 100.0
pruning_conformer_score_scaling_factor 1.0
use_clash_overlap no
write_growth_tree no
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file 6GJ8.ligand.min_scored.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename 6GJ8.ligand.min_scored.mol2
use_database_filter no
orient_ligand yes
automated_matching yes
receptor_site_file ../002_surface_spheres/selected_spheres.sph
max_orientations 1000
critical_points no
chemical_matching no
use_ligand_spheres no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
minimize_anchor yes
minimize_flexible_growth yes
use_advanced_simplex_parameters no
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_anchor_max_iterations 500
simplex_grow_max_iterations 500
simplex_grow_tors_premin_iterations 0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix flex.out
write_orientations no
num_scored_conformers 25
write_conformations no
cluster_conformations yes
cluster_rmsd_threshold 2
rank_ligands no
Run flexible docking:
>> ${dock6_path}/bin/dock6 -i flex.in -o flex.out
To view the RMSD of the top scoring docked ligand conformation, head the generated Mol2 file:
>> head flex.out_scored.mol2
In enrichment calculations, a set of known binders, actives, and a set of presumed non-binders, decoys, are docked to test the ability of the docking software to distinguish between actives and decoys. To quantify enrichment, this tutorial uses the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. In this section, an example enrichment calculation is performed using DOCK 6 and a sample list of 5 ligands.
Change directories to the 005_enrichment directory. Make a subdirectory called databases.
>> mkdir databases
For this tutorial, we will use 5 ligands from the PDB that bind in the Switch I/II (SOS) Pocket, including the BI-2852 molecule.
In the databases directory, create a ligand SMILES file called actives.smi and format it as follows:
Cn1cc(nc1)Cn2ccc3c2cc(cc3)CNCc4c(c5ccccc5[nH]4)[C@@H]6c7cc(ccc7C(=O)N6)O F0K
C[NH+](C)Cc1ccc(cc1)Nc2ccc(c(c2OC)F)c3cccc(c3)OC F8Q
c1cc2c(cc1Cl)OC[C@H](O2)CN 9R5
c1ccc(c(c1)C(=S)N2CCCC2)O 0QW
CN(C)Cc1ccc(cc1)Nc2ccc(nc2OC)c3cccc4c3OCCO4 D1W
Each line gives a ligand SMILES string and identifier, separated by whitespace. The PDB can be searched with the three-character ligand identifier to view the ligand summary page and PDB entries in which the ligand appears. The BI-2852 molecule has the id F0K.
The tldr web server will be used to build 3D representations of the ligands and generate a background of property-matched decoys for enrichment. Go to https://tldr.docking.org and create an account by selecting Welcome and Sign up in the top right corner of the page. Once your account is created, log in by selecting Welcome and Login in the top right. Once logged in, select Start in the menu to view the modules.
The build3d38 module will prepare molecules for docking from SMILES in four file formats: SDF, PDB, Mol2, and DB2. The DB2 file format will be used for enrichment. Select the build3d38 module from the tldr start page.
Upload the ligand SMILES file (actives.smi) by selecting the Choose File button, or directly paste the ligand SMILES into the input text box. Optionally, enter a memo in the memo text box for reference. Hit Go! to submit the job.
When the job is finished, you will receive a notification email. Refresh the job page to view the results. If the job page was closed, view all completed jobs by selecting Results -> All in the menu.
Download the zipped result file, build3d38_results.zip, under Result Files. Move the file to 005_enrichment/databases and rename it to ligands.zip. Extract the contents of the ligands.zip file and rename the unzipped folder.
>> mv build3d38_results.zip ligands.zip
>> unzip ligands.zip
>> mv *_build3d38 ligands
Extract the 1.tar.gz archive file inside the ligands directory.
>> tar -xvf 1.tar.gz
This should generate another folder containing the ligand SDF, PDB, Mol2, and DB2 files.
The DUDEZ module will assemble and build property-matched decoys in the DB2 file format.
The decoys are property-matched to the actives by the following molecular properties: formal charge, molecular weight, calculated water-octanol partition coefficient (cLogP), rotatable bonds, number of H-bond donors, and number of H-bond acceptors. Decoys are checked for similarity to the actives and similarity to each other by calculating the Tanimoto coefficient, molecules that are too similar to the actives are filtered and molecules that are similar to each other are clustered.
Select the DUDEZ module from the tldr start page.
Upload the ligand SMILES file (actives.smi) or paste the ligand SMILES as input. Optionally, enter a memo for reference. Hit Go! to submit the job.
Once the job finishes running, download the results.tar.gz file under Result Files and move it to 005_enrichment/databases. Extract the archive file contents.
>> tar -xvf results.tar.gz
This should create a folder called newdecoys. Inside the newdecoys folder should be a file called ligand.smi containing the ligand SMILES and identifiers, a file called decoys.smi containing the decoy SMILES and ZINC IDs, and a folder called decoys containing the decoy DB2 files from ZINC. There should be 50 property-matched decoys assigned to each ligand protomer. Since there are two protomers of the ligand F0K, there should be a total of 300 decoys. Make a copy of the decoys.smi file in the databases directory.
>> cp newdecoys/decoys.smi .
An alternate way to generate decoys is to use the DUD-E website.
Generate property-matched decoy SMILES using the DUD-E website
Go to https://dude.docking.org/generate and follow the instructions to generate DUD-E decoys.
When the job is finished, you will receive an email with a link to download the results. Click the link and move the downloaded dude-decoys.tar.gz archive file to 005_enrichment/databases. Extract the file contents.
>> tar -xvf dude-decoys.tar.gz
This should create a folder called dude-decoys containing a README file, a ligands.charge file, and a folder called decoys containing files with the picked decoys smiles for each ligand. There should be one file for each of the ligand protomers, and each file should contain the ligand SMILES as the first line and 50 property-matched decoy SMILES. Since there are two protomers of the ligand F0K, there should be a total of 6 decoy files. Move into dude-decoys/decoys and run the following command:
>> cat decoys.*.picked | grep -v ligand | awk '{print $1,$2}' > ../../decoys.smi
This concatenates all the decoy SMILES files, omitting the ligand SMILES lines, and prints the first and second columns (the decoy smiles and an identifier, respectively) separated by whitespace. The output is saved to a file called decoys.smi in the 005_enrichment directory. The decoys.smi file should contain 300 decoys SMILES.
Build decoy DB2 files from SMILES using TLDR
Now build the decoy DB2 files from SMILES using the tldr build3d38 module as was done for the ligands. Select the build3d38 module from the tldr start page. Upload the decoy SMILES file (decoys.smi) as input. Optionally, enter a memo for reference. Hit Go! to submit the job. When the job finishes, download the build3d38_results.zip file. Move the file to 005_enrichment/databases and rename it to decoys.zip. Extract the contents of the decoys.zip file and rename the created folder.
>> mv build3d38_results.zip decoys.zip
>> unzip decoys.zip
>> mv *_build3d38 decoys
Move inside the decoys directory and extract all the archive files inside. There should be a total of six files.
>> tar -xvf 1.tar.gz
... >> tar -xvf 6.tar.gz
Run DOCK
In the 005_enrichment directory, create a shell script called run_dock6_gridscore.csh and format it as follows:
# set directories
set pwd = `pwd`
set filedir = "${pwd}/databases"
set dockfilesdir = "${pwd}/.."
set workdir = "${pwd}/6GJ8_docking_grid"
if (-e $workdir ) then
echo "$workdir exists."
exit
endif
mkdir $workdir
cd $workdir
ls -l ${filedir}/actives.smi ${filedir}/decoys.smi
# generate sdi.txt
rm -f sdi.txt
touch sdi.txt
# for ligands
foreach id (`cat ${filedir}/actives.smi | awk '{print $2}'`)
echo $id
ls ${filedir}/ligands/*/${id}*.db2.gz >> sdi.txt # CHANGE THIS
end
## for DUD-E decoys
#foreach id (`cat ${filedir}/decoys.smi | awk '{print $2}'`)
# echo $id
# ls ${filedir}/decoys/*/${id}*.db2.gz >> sdi.txt # CHANGE THIS
#end
## for DUDE-Z (TLDR) decoys
#foreach id (`cat ${filedir}/decoys.smi | awk '{print $2}'`)
# echo $id
# ls ${filedir}/newdecoys/decoys/${id}*.db2.gz >> sdi.txt # CHANGE THIS
#end
# generate dock.in
cat << EOF > dock.in
conformer_search_type HDB
num_per_search 1
skip_broken yes
hdb_db2_input_file sdi.txt
hdb_db2_search_score_threshold 100.0
use_internal_energy no
use_database_filter no
orient_ligand yes
automated_matching no
automated_matching_iteration yes
distance_tolerance 0.2
distance_minimum 0.0
nodes_minimum 4
nodes_maximum 4
receptor_site_file dockfiles/002_surface_spheres/selected_spheres.sph
max_orientations 1000
critical_points no
chemical_matching no
use_ligand_spheres no
bump_filter no
score_molecules yes
contact_score_primary no
grid_score_primary yes
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix dockfiles/003_gridbox/grid
minimize_ligand yes
minimize_anchor no
minimize_flexible_growth no
use_advanced_simplex_parameters no
simplex_max_iterations 500
simplex_tors_premin_iterations 0
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix output
write_mol_solvation no
write_orientations no
num_scored_conformers 1
score_threshold 100.0
rank_ligands no
EOF
ln -s ${dockfilesdir} dockfiles
${dock6_path}/bin/dock6 -i dock.in -o dock.out -v
This script saves a list of the ligand and decoy DB2 files in a file called sdi.txt, creates a DOCK input file called dock.in, and then runs DOCK using the DB2 files listed in sdi.txt and the parameters given in dock.in. Depending on which decoy generation method you used, uncomment the correct code block under the "generate sdi.txt" section in the script so that the decoy DB2 files are written to sdi.txt. Modify the script so that the DB2 files are correctly listed to sdi.txt.
Run the script:
>> csh run_dock6_gridscore.csh
Enrichment calculation
Change directories to the created 6GJ8_docking_grid directory. Create a python script called make_extract_all_file_for_all_dock_out.py and format it as follows:
import sys, os
def open_dock_out(dict_zinc,filename,score_txt):
fh = open(filename,'r')
name = ''
score = 100000.0
for line in fh:
splitline = line.split()
if len(splitline) < 2:
continue
if splitline[0] == "Molecule:" :
name = splitline[1]
if splitline[0] == score_txt:
score = float(splitline[1])
if name in dict_zinc:
if (dict_zinc[name]>score):
dict_zinc[name] = score
else:
dict_zinc[name] = score
print (name, score)
fh.close()
def mySortFunc(e):
return e[0]
def write_extract(dict_zinc,file1,file2):
fho = open(file1,'w')
data = []
for key in dict_zinc:
name = key
score = dict_zinc[name]
ostring = "../AB/ABaaaaaa/t00003/ABaaaaaa0025 27991 %16s 1 2702 2702 0.10 11 1 1612 1 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 %6.2f\n"%(name, score)
fho.write(ostring)
data.append([score,name])
fho.close()
data.sort(key=mySortFunc)
fho = open(file2,'w')
for ele in data:
name = ele[1]
score = ele[0]
ostring = "../AB/ABaaaaaa/t00003/ABaaaaaa0025 27991 %16s 1 2702 2702 0.10 11 1 1612 1 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 %6.2f\n"%(name, score)
fho.write(ostring)
fho.close()
dict_zinc1 = {}
dirpath1 = sys.argv[1]
score_txt1 = sys.argv[2]
if (score_txt1 != "Chemgrid_Score:" or score_txt1 != "Grid_Score:" or score_txt1 != "Descriptor_Score:"):
print ("score_txt1 must be Chemgrid_Score: or Grid_Score: or Descriptor_Score:")
exit
filename2 = "extract_all.uniq.txt"
filename3 = "extract_all.sort.uniq.txt"
isdir = os.path.isdir(dirpath1)
if not (isdir):
print("Error %s is not a dir"%dirpath1)
exit()
openhandel = os.popen('ls %s/dock.out'%dirpath1)
for line in openhandel:
filename1 = line.strip()
open_dock_out(dict_zinc1,filename1,score_txt1)
write_extract(dict_zinc1,filename2,filename3)
This script reads the dock.out files, extracts the names and scores of the docked molecules, and writes the names and scores to two files: extract_all.uniq.txt and extract_all.sort.uniq.txt. In both files, each line contains the name and score of a unique molecule as determined by the name. The molecule names are written to the third column and the corresponding scores are written to the last column. The second file, extract_all.sort.uniq.txt, is sorted by score, with the best scoring (lowest energy) molecule at the top, and the worst scoring (highest energy) molecule at the bottom. The script takes two arguments, the directory path containing the dock.out files and the scoring function.
Run the script:
>> python make_extract_all_file_for_all_dock_out.py . Grid_Score:
Create two files containing lists of the names of the ligands and decoys by running the following commands:
>> awk '{print $2}' ../databases/actives.smi > ligands.name
>> awk '{print $2}' ../databases/decoys.smi > decoys.name
The ROC curve can be plotted with a log scale on the x-axis (% of decoys found or false positive rate) to weight early enrichment more. The area under the curve of this plot is called logAUC. Calculate the AUC and logAUC and plot the ROC curve by calling the following scripts:
>> ${DOCKBASE}/analysis/enrich.py -i . -l ligands.name -d decoys.name
>> ${DOCKBASE}/analysis/plots.py -i . -l ligands.name -d decoys.name
[ these scripts are distributed with DOCK 3.8 ]
The example enrichment results below were generated using the 5 ligands listed above and a background of DUDE-E decoys.
The AUC and logAUC values are given in the first line of the roc_own.txt file created by enrich.py. Head the roc_own.txt file to view the AUC and logAUC.
>> head -1 roc_own.txt
#AUC 77.60 LogAUC 19.45
The roc_own.png file created by plots.py is the plot of the ROC curve. The x-axis is plotted with a log scale of base 10 so as to emphasize early enrichment.
The output_scored.mol2 file can be visualized in Chimera using ViewDock.