2023.02.24.7RPZ - tbalius/teb_docking_test_sets GitHub Wiki
By Y. Stanley Tan, Mayukh Chakrabarti, and Trent E. Balius
Based on 2022 Rizzo lab tutorial.
This tutorial expects a basic understanding of Unix.
Here is a Unix tutorial.
This tutorial will use UCSF Chimera to manually prepare files for docking and to visualize dock results.
To show Chimera's command line, select Tools -> General Controls -> Command Line in the menu bar.
Files can be opened in Chimera by selecting File -> Open... in the menu bar at the top. Structures can be saved by selecting File -> Save Mol2... in the menu bar to save as a Mol2 file, or by selecting File -> Save PDB... to save as a PDB file. Save a Chimera session by selecting File -> Save Session As... in the menu bar. This will create a Python script that restores open models and any modifications made at the time of saving. To close the current session, select File -> Close Session in the menu bar.
DOCK results should be visualized using Chimera's ViewDock. To open an output Mol2 file with ViewDock, select Tools -> Surface/Binding Analysis -> ViewDock in the menu bar and select the file in the Open Dock Results window that opens. In the Type Selection dialog box, select Dock 4, 5 or 6 for DOCK 6 results. The ViewDock tool allows the poses stored in the Mol2 file to be viewed one at a time instead of all at once. Select Column -> Show in the menu bar of the ViewDock window and select a value to display it in the table.
Here is a Chimera tutorial.
Here is the UCSF DOCK website.
Make a directory to house the files for this tutorial:
>> mkdir 7RPZ_tutorial
Inside the 7RPZ_tutorial directory, make the following subdirectories:
001_structure
002_surface_spheres
003_gridbox
004_dock
005_enrichment
Download the 7RPZ PDB file from the RCSB Protein Data Bank by going to https://rcsb.org/structure/7RPZ and selecting "PDB Format" in the Download Files drop-down menu in the top right. Save the file in the 001_structure directory. Alternatively, running the following command will download the file to the working directory:
>> wget https://files.rcsb.org/download/7RPZ.pdb --no-check-certificate
Open 7RPZ.pdb in Chimera.
Isolate the receptor and cofactors
Select the ligand by typing select #0: 6IC in the command line. The selected atoms should be highlighted in green. Delete the selected atoms by selecting Actions -> Atoms/Bonds -> delete in the menu bar or by typing delete selected in the command line.
Run Dock Prep
Run Dock Prep by selecting Tools -> Surface/Binding Analysis -> Dock Prep in the menu bar, make sure that the "Delete solvent", "If alternate locations, keep only highest occupancy" (delete alternate side chains), and "Incomplete side chains: Replace using Dunbrack 2010 rotamer library" (mutate incomplete side chains), "Add hydrogens", and "Add charges" options are selected so that all the necessary Dock Prep steps are run before clicking OK in the "Dock Prep" dialog box that opens. Select "Unspecified (determined by method)" and click OK in the "Add Hydrogens for Dock Prep" dialog box that opens to add hydrogens. Leave the default selections and click OK in the "Assign Charges for Dock Prep" and "Specify Net Charges" dialog boxes to assign partial charges.
Save as kras_cof.mol2 and close the session.
Reopen 7RPZ.pdb.
Isolate the ligand
Select the ligand by typing select #0: 6IC in the command line. Invert the selection by typing select invert in the command line. Delete the selected receptor, cofactors, and water molecules, leaving only the ligand.
Run Dock Prep
Run Dock Prep on the ligand. Select "Unspecified (determined by method)" in the "Add Hydrogens for Dock Prep" dialog box and leave all other settings as the defaults. The ligand should have a net charge of +2. Visually inspect the protonation of the ligand and make sure that it is correct.
Save as mrtx1133.mol2 and close the session.
Generating DMS
Open kras_cof.mol2.
Run the surface calculation by selecting Actions -> Surface -> Show in the menu bar or by typing surf #0 in the command line.
Save the surface as a DMS file by selecting Tools -> Structure Editing -> Write DMS in the menu bar. Save as kras_cof.dms.
Creating surface spheres
Change directories to the 002_surface_spheres directory:
>> cd ../002_surface_spheres
Create a file called INSPH by running the command and format the file as follows:
../001_structure/kras_cof.dms
R
X
0.0
4.0
1.4
7RPZ.sph
This file will be the input file for the sphere generation program, sphgen. The first line of the file specifies the input DMS file. In the second line, R
indicates that spheres should be generated outside the surface. In the third line, X
indicates that all surface points should be used. The fourth line specifies the acceptable steric clash in angstroms, which, in this case, is set to 0.0
or no steric clash allowed. The fifth and sixth lines specify the maximum and minimum sphere radii in angstroms, respectively. The last line specifies the output sphere file name.
Run sphgen to generate spheres based on the parameters given in the INSPH file:
>> ${dock6_path}/bin/sphgen -i INSPH -o OUTSPH
- ${dock6_path} is a variable (placeholder) for the DOCK 6 installation path (e.g. /home/baliuste/zzz.github/dock6)
The surface spheres generated by sphgen can be visualized in Chimera. In a Chimera session with kras_cof.mol2 open and the surface shown, open the sphere file (7RPZ.sph) created by sphgen to visualize the surface spheres.
Selecting relevant spheres
To select spheres based on the distance from the ligand, run the following command:
>> ${dock6_path}/bin/sphere_selector 7RPZ.sph ../001_structure/mrtx1133.mol2 3.0
The first two arguments passed to sphere_selector specify the surface sphere and ligand files, respectively. The last argument, 3.0
, specifies that only spheres within 3.0 angstroms of the ligand should be selected.
The selected_spheres.sph file generated by sphere_selector can be visualized in Chimera.
Change directories to the 003_gridbox directory.
Box generation
Create the input file for box generation called showbox.in and format the file as follows:
Y
8.0
../002_surface_spheres/selected_spheres.sph
1
7RPZ.box.pdb
In the first line of the file, Y
(Yes) indicates that showbox should create a box. The second line specifies the distance, in angstroms, between the spheres and the box borders. The third line specifies the input sphere file. The fourth line indicates the cluster number. The last line specifies the output box file name.
Generate the box based on the parameters given in showbox.in by running showbox as follows:
>> ${dock6_path}/bin/showbox < showbox.in
The 7RPZ.box.pdb file generated by showbox can be visualized in Chimera. The generated box is used to define the boundaries of the energetic grids.
Grid generation
Create the input file for grid generation called grid.in and format it as follows:
compute_grids yes
grid_spacing 0.4
output_molecule no
contact_score no
energy_score yes
energy_cutoff_distance 9999
atom_model a
attractive_exponent 6
repulsive_exponent 9
distance_dielectric yes
dielectric_factor 4
allow_non_integral_charges no
bump_filter yes
bump_overlap 0.75
receptor_file ../001_structure/kras_cof.mol2
box_file 7RPZ.box.pdb
vdw_definition_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
score_grid_prefix grid
Generate grids by running:
>> ${dock6_path}/bin/grid -i grid.in -o grid.out
Change directories to the 004_dock directory.
This calculation performs an energy evaluation on the crystallographic ligand conformation without any movement.
Create the input file for the single point calculation called single_point.in and format it as follows:
conformer_search_type rigid
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file ../001_structure/mrtx1133.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename ../001_structure/mrtx1133.mol2
use_database_filter no
orient_ligand no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix single_point.out
write_orientations no
num_scored_conformers 1
rank_ligands no
To run the single point calculation:
>> ${dock6_path}/bin/dock6 -i single_point.in -o single_point.out
This calculation performs a minimization run on the crystallographic ligand conformation, allowing the ligand to subtly adjust its conformation (internal degrees of freedom) and orientation (3 translational and 3 orientational degrees of freedom).
Create the input file for energy minimization called min.in:
conformer_search_type rigid
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file ../001_structure/mrtx1133.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename ../001_structure/mrtx1133.mol2
use_database_filter no
orient_ligand no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
simplex_max_iterations 1000
simplex_tors_premin_iterations 0
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_random_seed 0
simplex_restraint_min yes
simplex_coefficient_restraint 10.0
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix 7RPZ.ligand.min
write_orientations no
num_scored_conformers 1
rank_ligands no
Run energy minimization:
>> ${dock6_path}/bin/dock6 -i min.in -o min.out
To view the RMSD of the minimized ligand conformation, head the generated Mol2 file:
>> head 7RPZ.ligand.min_scored.mol2
The minimized ligand conformation can be visualized in Chimera.
To test orienting without considering ligand flexibility, we perform rigid docking. We use the crystallographic ligand conformation for this test. This is primarily used for testing purposes only, not for screening.
Create the input file for rigid docking called rigid.in:
conformer_search_type rigid
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100
ligand_atom_file 7RPZ.ligand.min_scored.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename 7RPZ.ligand.min_scored.mol2
use_database_filter no
orient_ligand yes
automated_matching yes
receptor_site_file ../002_surface_spheres/selected_spheres.sph
max_orientations 2000
critical_points no
chemical_matching no
use_ligand_spheres no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
simplex_max_iterations 1000
simplex_tors_premin_iterations 0
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix rigid.out
write_orientations no
num_scored_conformers 1
rank_ligands no
Run rigid docking:
>> ${dock6_path}/bin/dock6 -i rigid.in -o rigid.out
To view the RMSD of the docked ligand conformation, head the generated Mol2 file:
>> head rigid.out_scored.mol2
To test the ligand flexibility algorithm without considering orienting, we perform fixed anchor docking. We use the crystallographic ligand placement for this test. This is primarily used for testing purposes only, not for screening, as a ligand in a database will be centered elsewhere, not in the site.
Create the input file for fixed anchor docking called fixed.in:
conformer_search_type flex
write_fragment_libraries no
user_specified_anchor no
limit_max_anchors no
min_anchor_size 5
pruning_use_clustering yes
pruning_max_orients 10000
pruning_clustering_cutoff 100
pruning_conformer_score_cutoff 100.0
pruning_conformer_score_scaling_factor 1.0
use_clash_overlap no
write_growth_tree no
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file ../001_structure/mrtx1133.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename ../001_structure/mrtx1133.mol2
use_database_filter no
orient_ligand no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
minimize_anchor yes
minimize_flexible_growth yes
use_advanced_simplex_parameters no
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1
simplex_trans_step 1
simplex_rot_step 0.1
simplex_tors_step 10
simplex_anchor_max_iterations 500
simplex_grow_max_iterations 500
simplex_grow_tors_premin_iterations 0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix 7RPZ_fad
write_orientations no
num_scored_conformers 100
write_conformations no
cluster_conformations yes
cluster_rmsd_threshold 2.0
rank_ligands no
Run fixed anchor docking:
>> ${dock6_path}/bin/dock6 -i fixed.in -o fixed.out
To view the RMSD of the top scoring docked ligand conformation, head the generated Mol2 file:
>> head 7RPZ_fad_scored.mol2
This test combines orienting and ligand flexibility, and is the most applicable to production runs of dock.
Create the input file for flexible docking called flex.in:
conformer_search_type flex
write_fragment_libraries no
user_specified_anchor no
limit_max_anchors no
min_anchor_size 5
pruning_use_clustering yes
pruning_max_orients 5000
pruning_clustering_cutoff 2500
pruning_conformer_score_cutoff 100.0
pruning_conformer_score_scaling_factor 1.0
use_clash_overlap no
write_growth_tree no
use_internal_energy yes
internal_energy_rep_exp 12
internal_energy_cutoff 100.0
ligand_atom_file 7RPZ.ligand.min_scored.mol2
limit_max_ligands no
skip_molecule no
read_mol_solvation no
calculate_rmsd yes
use_rmsd_reference_mol yes
rmsd_reference_filename 7RPZ.ligand.min_scored.mol2
use_database_filter no
orient_ligand yes
automated_matching yes
receptor_site_file ../002_surface_spheres/selected_spheres.sph
max_orientations 1000
critical_points no
chemical_matching no
use_ligand_spheres no
bump_filter no
score_molecules yes
contact_score_primary no
contact_score_secondary no
grid_score_primary yes
grid_score_secondary no
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix ../003_gridbox/grid
gist_score_secondary no
multigrid_score_secondary no
dock3.5_score_secondary no
continuous_score_secondary no
footprint_similarity_score_secondary no
pharmacophore_score_secondary no
hbond_score_secondary no
descriptor_score_secondary no
gbsa_zou_score_secondary no
gbsa_hawkins_score_secondary no
SASA_score_secondary no
amber_score_secondary no
minimize_ligand yes
minimize_anchor yes
minimize_flexible_growth yes
use_advanced_simplex_parameters no
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_anchor_max_iterations 500
simplex_grow_max_iterations 500
simplex_grow_tors_premin_iterations 0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix flex.out
write_orientations no
num_scored_conformers 25
write_conformations no
cluster_conformations yes
cluster_rmsd_threshold 2
rank_ligands no
Run flexible docking:
>> ${dock6_path}/bin/dock6 -i flex.in -o flex.out
To view the RMSD of the top scoring docked ligand conformation, head the generated Mol2 file:
>> head flex.out_scored.mol2
In enrichment calculations, a set of known binders, actives, and a set of presumed non-binders, decoys, are docked to test the ability of the docking software to distinguish between actives and decoys. To quantify enrichment, this tutorial uses the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. In this section, an example enrichment calculation is performed using DOCK 6 and a sample list of 5 ligands.
Change directories to the 005_enrichment directory. Make a subdirectory called databases.
>> mkdir databases
For this tutorial, we will use 5 ligands from the PDB that bind noncovalently in the Switch II Pocket, including the MRTX-1133 molecule.
Create a ligand SMILES file called actives.smi:
C#Cc1c(ccc2c1c(cc(c2)O)c3c(c4c(cn3)c(nc(n4)OC[C@@]56CCCN5C[C@@H](C6)F)N7C[C@H]8CC[C@@H](C7)N8)F)F 6IC
C[C@H]1CN(CCN1c2cccc(n2)c3nc(on3)[C@]4(CCCc5c4c(c(s5)N)C#N)C)C LR4
C[C@@]1(CCCc2c1c(c(s2)N)C#N)c3nc(no3)c4ccc(cc4)N LXU
Cc1cccc2c1c(ccc2)N3CCc4c(nc(nc4N5[C@@H]6CC[C@H]5CNC6)OC[C@@H]7CCCN7C)C3 05F
c1ccc2c(c1)cc(cc2c3c(c4c(cn3)c(nc(n4)OC[C@@]56CCCN5C[C@@H](C6)F)N7C[C@H]8CC[C@@H](C7)N8)F)O 7NL
Each line gives a ligand SMILES string and identifier, separated by whitespace. The PDB can be searched with the three-character ligand identifier to view the ligand summary page and PDB entries in which the ligand appears. The MRTX-1133 molecule as the id 6IC.
The tldr web server will be used to build 3D representations of the ligands and generate a background of property-matched decoys for enrichment. Go to https://tldr.docking.org and create an account by selecting Welcome and Sign up in the top right corner of the page. Once your account is created, log in by selecting Welcome and Login in the top right. Once logged in, select Start in the menu to view the modules.
The build3d38 module will prepare molecules for docking from SMILES in four file formats: SDF, PDB, Mol2, and DB2. The DB2 file format will be used for enrichment. Select the build3d38 module from the tldr start page.
Upload the ligand SMILES file by selecting the Choose File button, or directly paste the ligand SMILES into the input text box. Optionally, enter a memo in the memo text box for reference. Hit Go! to submit the job.
When the job is finished, you will receive a notification email. Refresh the job page to view the results. If the job page was closed, view all completed jobs by selecting Results -> All in the menu.
Download the zipped result file, build3d38_results.zip, under Result Files. Move the file to 005_enrichment/databases and rename it to ligands.zip. Extract the contents of the ligands.zip file and rename the unzipped folder.
>> mv build3d38_results.zip ligands.zip
>> unzip ligands.zip
>> mv *_build3d38 ligands
Extract the 1.tar.gz archive file inside the ligands folder.
>> tar -xvf 1.tar.gz
This should create another folder containing the ligand SDF, PDB, Mol2, and DB2 files.
The DUDEZ module will assemble and build property-matched decoys in the DB2 file format.
The decoys are property-matched to the actives by the following molecular properties: formal charge, molecular weight, calculated water-octanol partition coefficient (cLogP), rotatable bonds, number of H-bond donors, and number of H-bond acceptors. Decoys are checked for similarity to the actives and similarity to each other by calculating the Tanimoto coefficient, molecules that are too similar to the actives are filtered and molecules that are similar to each other are clustered.
Select the DUDEZ module from the tldr start page.
Upload the ligand SMILES file or paste the ligand SMILES as input. Optionally, enter a memo for reference. Hit Go! to submit the job.
Once the job finishes running, download the results.tar.gz file under Result Files and move it to 005_enrichment/databases. Extract the archive file contents.
>> tar -xvf results.tar.gz
This should create a folder called newdecoys. Inside the newdecoys folder should be a file called ligand.smi containing the ligand SMILES and identifiers, a file called decoys.smi containing the decoy SMILES and ZINC IDs, and a folder called decoys containing the decoy DB2 files from ZINC. There should be 50 property-matched decoys assigned to each ligand protomer. Make a copy of the decoys.smi file in the databases directory.
>> cp newdecoys/decoys.smi .
An alternate way to generate decoys is to use the DUD-E website.
Generate property-matched decoy SMILES using the DUD-E website
Go to https://dude.docking.org/generate and follow the instructions to generate DUD-E decoys.
When the job is finished, you will receive an email with a link to download the results. Click the link and move the downloaded dude-decoys.tar.gz archive file to 005_enrichment/databases. Extract the file contents.
>> tar -xvf dude-decoys.tar.gz
This should create a folder called dude-decoys containing a README file, a ligands.charge file, and a folder called decoys containing files with the picked decoys smiles for each ligand. Move into dude-decoys/decoys and run the following command:
>> cat decoys.*.picked | grep -v ligand | awk '{print $1,$2}' > ../../decoys.smi
This concatenates all the decoy SMILES files, omitting the ligand SMILES lines, and prints the first and second columns (the decoy smiles and an identifier, respectively) separated by whitespace. The output is saved to a file called decoys.smi in the 005_enrichment directory.
Build decoy DB2 files from SMILES using TLDR
Now build the decoy DB2 files from SMILES using the tldr build3d38 module as was done for the ligands. Select the build3d38 module from the tldr start page. Upload the decoy SMILES file (decoys.smi) as input. Optionally, enter a memo for reference. Hit Go! to submit the job. When the job finishes, download the build3d38_results.zip file. Move the file to 005_enrichment/databases and rename it to decoys.zip. Extract the contents of the decoys.zip file and rename the created folder.
>> mv build3d38_results.zip decoys.zip
>> unzip decoys.zip
>> mv *_build3d38 decoys
Move inside the decoys folder and extract all the archive files inside.
>> tar -xvf 1.tar.gz
... >> tar -xvf 5.tar.gz
Run DOCK
In the 005_enrichment directory, create a shell script called run_dock6_gridscore.csh.
# set directories
set pwd = `pwd`
set filedir = "${pwd}/databases"
set dockfilesdir = "${pwd}/.."
set workdir = "${pwd}/7RPZ_docking_grid"
if (-e $workdir ) then
echo "$workdir exists."
exit
endif
mkdir $workdir
cd $workdir
ls -l ${filedir}/actives.smi ${filedir}/decoys.smi
# generate sdi.txt
rm -f sdi.txt
touch sdi.txt
# for ligands
foreach id (`cat ${filedir}/actives.smi | awk '{print $2}'`)
echo $id
ls ${filedir}/ligands/1/${id}*.db2.gz >> sdi.txt
end
## for DUD-E decoys
#foreach id (`cat ${filedir}/decoys.smi | awk '{print $2}'`)
# echo $id
# ls ${filedir}/decoys/*/${id}*.db2.gz >> sdi.txt
#end
## for DUDE-Z (TLDR) decoys
#foreach id (`cat ${filedir}/decoys.smi | awk '{print $2}'`)
# echo $id
# ls ${filedir}/newdecoys/decoys/${id}*.db2.gz >> sdi.txt
#end
# generate dock.in
cat << EOF > dock.in
conformer_search_type HDB
num_per_search 1
skip_broken yes
hdb_db2_input_file sdi.txt
hdb_db2_search_score_threshold 100.0
use_internal_energy no
use_database_filter no
orient_ligand yes
automated_matching no
automated_matching_iteration yes
distance_tolerance 0.2
distance_minimum 0.0
nodes_minimum 4
nodes_maximum 4
receptor_site_file dockfiles/002_surface_spheres/selected_spheres.sph
max_orientations 1000
critical_points no
chemical_matching no
use_ligand_spheres no
bump_filter no
score_molecules yes
contact_score_primary no
grid_score_primary yes
grid_score_rep_rad_scale 1
grid_score_vdw_scale 1
grid_score_es_scale 1
grid_score_grid_prefix dockfiles/003_gridbox/grid
minimize_ligand yes
minimize_anchor no
minimize_flexible_growth no
use_advanced_simplex_parameters no
simplex_max_iterations 500
simplex_tors_premin_iterations 0
simplex_max_cycles 1
simplex_score_converge 0.1
simplex_cycle_converge 1.0
simplex_trans_step 1.0
simplex_rot_step 0.1
simplex_tors_step 10.0
simplex_random_seed 0
simplex_restraint_min no
atom_model all
vdw_defn_file ${dock6_path}/parameters/vdw_AMBER_parm99.defn
flex_defn_file ${dock6_path}/parameters/flex.defn
flex_drive_file ${dock6_path}/parameters/flex_drive.tbl
ligand_outfile_prefix output
write_mol_solvation no
write_orientations no
num_scored_conformers 1
score_threshold 100.0
rank_ligands no
EOF
ln -s ${dockfilesdir} dockfiles
${dock6}/bin/dock6 -i dock.in -o dock.out -v
This script saves a list of the ligand and decoy DB2 files in a file called sdi.txt, creates a DOCK input file called dock.in, and then runs DOCK using the DB2 files listed in sdi.txt and the parameters given in dock.in. Depending on which decoy generation method you used, uncomment the correct code block under the "generate sdi.txt" section in the script so that the decoy DB2 files are written to sdi.txt.
Run the script:
>> csh run_dock6_gridscore.csh
Enrichment calculation
Change directories to the created 7RPZ_docking_grid directory and create a python script called make_extract_all_file_for_all_dock_out.py.
import sys, os
def open_dock_out(dict_zinc,filename,score_txt):
fh = open(filename,'r')
name = ''
score = 100000.0
for line in fh:
splitline = line.split()
if len(splitline) < 2:
continue
if splitline[0] == "Molecule:" :
name = splitline[1]
if splitline[0] == score_txt:
score = float(splitline[1])
if name in dict_zinc:
if (dict_zinc[name]>score):
dict_zinc[name] = score
else:
dict_zinc[name] = score
print (name, score)
fh.close()
def mySortFunc(e):
return e[0]
def write_extract(dict_zinc,file1,file2):
fho = open(file1,'w')
data = []
for key in dict_zinc:
name = key
score = dict_zinc[name]
ostring = "../AB/ABaaaaaa/t00003/ABaaaaaa0025 27991 %16s 1 2702 2702 0.10 11 1 1612 1 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 %6.2f\n"%(name, score)
fho.write(ostring)
data.append([score,name])
fho.close()
data.sort(key=mySortFunc)
fho = open(file2,'w')
for ele in data:
name = ele[1]
score = ele[0]
ostring = "../AB/ABaaaaaa/t00003/ABaaaaaa0025 27991 %16s 1 2702 2702 0.10 11 1 1612 1 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 %6.2f\n"%(name, score)
fho.write(ostring)
fho.close()
dict_zinc1 = {}
dirpath1 = sys.argv[1]
score_txt1 = sys.argv[2]
if (score_txt1 != "Chemgrid_Score:" or score_txt1 != "Grid_Score:" or score_txt1 != "Descriptor_Score:"):
print ("score_txt1 must be Chemgrid_Score: or Grid_Score: or Descriptor_Score:")
exit
filename2 = "extract_all.uniq.txt"
filename3 = "extract_all.sort.uniq.txt"
isdir = os.path.isdir(dirpath1)
if not (isdir):
print("Error %s is not a dir"%dirpath1)
exit()
openhandel = os.popen('ls %s/dock.out'%dirpath1)
for line in openhandel:
filename1 = line.strip()
open_dock_out(dict_zinc1,filename1,score_txt1)
write_extract(dict_zinc1,filename2,filename3)
This script reads the dock.out files, extracts the names and scores of the docked molecules, and writes the names and scores to two files: extract_all.uniq.txt and extract_all.sort.uniq.txt. In both files, each line contains the name and score of a unique molecule as determined by the name. The molecule names are written to the third column and the corresponding scores are written to the last column. The second file, extract_all.sort.uniq.txt, is sorted by score, with the best scoring (lowest energy) molecule at the top, and the worst scoring (highest energy) molecule at the bottom. The script takes two arguments, the directory path containing the dock.out files and the scoring function.
Run the script:
>> python make_extract_all_file_for_all_dock_out.py . Grid_Score:
Create two files containing lists of the names of the ligands and decoys by running the following commands:
>> awk '{print $2}' ../databases/actives.smi > ligands.name
>> awk '{print $2}' ../databases/decoys.smi > decoys.name
The ROC curve can be plotted with a log scale on the x-axis (% of decoys found or false positive rate) to weight early enrichment more. The area under the curve of this plot is called logAUC. Calculate the AUC and logAUC and plot the ROC curve by calling the following scripts:
>> ${DOCKBASE}/analysis/enrich.py -i . -l ligands.name -d decoys.name
>> ${DOCKBASE}/analysis/plots.py -i . -l ligands.name -d decoys.name
[ these scripts are distributed with DOCK 3.8 ]
The example enrichment results below were generated using the 5 ligands listed above and a background of DUDE-E decoys.
The AUC and logAUC values are given in the first line of the roc_own.txt file created by enrich.py. Head the roc_own.txt file to view the AUC and logAUC.
>> head -1 roc_own.txt
#AUC 29.40 LogAUC -9.03
The roc_own.png file created by plots.py is the plot of the ROC curve. The x-axis is plotted with a log scale of base 10 so as to emphasize early enrichment.
The output_scored.mol2 file can be visualized in Chimera using ViewDock.