Introduction to AARON: Screening Ligands - QChASM/Aaron GitHub Wiki
The simplest and most common way to use AARON is to predict stereoselectivities for different catalysts/ligands for a given reaction.
This could be substituted analogs of a single ligand or different ligand scaffolds, or both. We'll cover each of these uses below. For each example, we'll use an existing TS library (see the current structures available in the TS Library). See Building a TS Library for instructions on building your own TS library.
In particular, we'll use a simplified version of an bipyridine-N,N'-dioxide catalyzed allylation of benzaldehyde. This is a simple reaction with a single, well-defined elementary step that is stereocontrolling. For this reaction, one should consider a total of 10 TS structures: 5 leading to the (R)-alcohol and 5 leading to (S), which correspond to different arrangements of the substrate and catalyst around a hexacoordinate silicon. However, for these examples we will compute only the four lowest-lying TS structures (R/ts2, S/ts1, S/ts2, and S/ts3) to speed up the examples (for real applications, it is advisable to compute all of the TS structures in the TS library!).
The lowest-lying TS structure (R/ts2) is shown below:
The other three structures in this library can be viewed here
First, let's consider the case screening a substituted version of the ligand/catalyst found in the TS library.
We'll make predictions for a substituted bipyridine-N,N'-dioxide in which we've replaced hydrogen atoms 18 and 21 of the ligand with fluorine (named F in the substituent library).
Note: When specifying the substitution of ligand atoms, the numbers are the atom numbers for the ligand only! (e.g. 'L18' and 'L21' in the image above).
reaction_type=Allylation template=NN-dioxide_example charge=1 method=b97d denfit=true basis=6-31G &Ligands Cat1: 18=F 21=F &The reaction_type and template keywords are required, as is the 'Ligands' section. All other keywords are optional, and will default to the values found in $HOME/.aaronrc (if it exists) or $QCHASM/Aaron/.aaronrc, or to values we have set in AARON's code.
This input file requests that we use the TS structures from
$QCHASM/Aaron/TS_geoms/Allylation/NN-dioxide_example
The only keywords are the charge, denfit, method, and basis set. We need to specify the charge (+1), since the default charge in $QCHASM/Aaron/.aaronrc is neutral (charge = 0). We will use B97D (method=B97) with density fitting (denfit=true) and 6-31G basis set (basis=6-31G) to ensure that this examples runs quickly.
The input file above requests that Aaron compute all 4 TS structures found in the TS library for the catalyst Cat1 constructed by replacing atoms 18 and 21 of the ligand present in the TS library with fluorine atoms.
Note that we can use any name we'd like to specify the ligands/catalysts (i.e. Lig19, cat19a, Dog19, etc).
Save the above input file to example1.in and then run
Aaron example1.in
AARON will first make the directory Cat1, then construct a directory tree mimicking that found in the TS library for this reaction:
R/ ts2/ S/ ts1/ ts2/ ts3/AARON will then proceed through a prescribed series of 5 different steps to locate and characterize each TS structure, printing an updated status for each TS structure as it proceeds.
For instance, here is the status update shortly after starting AARON:
Status for all jobs...(Fri Aug 10 07:36:27 2018) -------------------------------------------------------------------------------- Status for Cat1 jobs...... Pending jobs: Cat1/R/ts2 step 1 attempt 1: No msg recorded Cat1/S/ts1 step 1 attempt 1: No msg recorded Cat1/S/ts2 step 1 attempt 1: No msg recorded Cat1/S/ts3 step 1 attempt 1: No msg recorded Aaron will check status of all jobs after 300 seconds Sleeping...
AARON lists running, pending, and finished jobs for each catalyst/ligand. (Currently, all four jobs are pending for 'step 1'.)
Also reported is the number of 'attempts' that have been made for the current step. Attempts refers to the number of times AARON has had to restart this particular step. For later steps, AARON will also report the number of 'cycles'. A new cycle is started any time AARON needs to return to a previous step in order to correct an erroneous geometry.
Here is the status screen at a later time:
Status for all jobs...(Sun Aug 26 11:39:52 2018) -------------------------------------------------------------------------------- Status for Cat1 jobs...... Completed jobs: Cat1/S/ts2 step 3 is done Cat1/S/ts3 step 3 is done Completed optimizations: Cat1/R/ts2 finished normally Running jobs: Cat1/S/ts1 step 2 attempt 1 cycle 1. Progress: 0.065594/NO, 0.004914/NO, 0.308802/NO, 0.001190/NO Aaron will check status of all jobs after 300 seconds Sleeping...
Now that there are running jobs, the 'Progress' section reports the Maximum Force, RMS Force, Maximum Displacement, and RMS Displacement along with whether or not these are converged for the current geometry optimization step:
Cat1/S/ts1 step 2 attempt 1 cycle 1. Progress: 0.065594/NO, 0.004914/NO, 0.308802/NO, 0.001190/NO
If you wish to examine any particular input (.com) or output (.log) file, these can be readily located within the directory structure built by AARON. These files are named according to the ligand/catalyst name, stereoisomer, TS number, and the step number.
For instance,
Cat1.R.ts1.1.com Cat1.R.ts1.1.log
are the input and output files for Step 1 (a quick semi-empirical optimization of the newly-added substituents) for TS1(R) for Cat1.
AARON will automatically check the status of all jobs every 5 minutes (by default). If a job has completed, it will automatically be advanced to the next Step. If any problems are detected with a job (an error in the output file, a structure in which connectivity has changed, etc.) AARON will attempt to fix the problem and will re-submit a new job automatically.
In this way, AARON minimizes wasted CPU time, since problems with optimizations are detected and corrected within 5 minutes.
After each 5 minute update, AARON will print relative and absolute energies, enthalpies, etc. for any completed optimizations to a 'input-file_thermo.dat' file (e.g. example1_thermo.dat).
Once complete, AARON will automatically terminate:
Status for all jobs...(Fri Aug 10 10:03:15 2018) -------------------------------------------------------------------------------- Status for Cat1 jobs...... Completed optimizations: Cat1/R/ts2 finished normally Cat1/S/ts1 finished normally Cat1/S/ts2 finished normally Cat1/S/ts3 finished normally Aaron finished, terminated normally
At this point, relative and absolute energies, free energies, etc. of all computed structures should be listed in
example1_thermo.dat
After running this example, the contents of example1_thermo.dat should be as follows:
Available thermochemical data for Cat1 (T = 298 K): Cat1: Relative thermochemistry (kcal/mol) E H G(RRHO)G(quasi-RRHO) ee 64.7% 64.8% 58.2% 61.5% R -------------------------------------------------- ts2 0.0 0.0 0.0 0.0 -------------------------------------------------- S -------------------------------------------------- ts1 4.0 3.7 3.1 3.4 -------------------------------------------------- ts2 1.2 1.2 1.0 1.1 -------------------------------------------------- ts3 1.5 1.5 1.5 1.5 -------------------------------------------------- Absolute thermochemistry (hartees) E H G(RRHO)G(quasi-RRHO) R ------------------------------------------------------------ ts2 -2515.735168 -2515.371022 -2515.454833 -2515.451130 S ------------------------------------------------------------ ts1 -2515.728864 -2515.365100 -2515.449836 -2515.445739 ts2 -2515.733247 -2515.369098 -2515.453211 -2515.449337 ts3 -2515.732815 -2515.368654 -2515.452449 -2515.448810
The first table contains energies (E), 0K enthalpies (H), and free energies (computed both using the RRHO approximation and Grimme's quasi-RRHO approximation with ω0=100cm-1) relative to the lowest-lying TS structure (in this case R/TS2) all in kcal/mol.
The predicted ee's based on a Boltzmann weighting of the energies, enthalpies, and free energies are listed at the top of this table.
The second table contains the corresponding absolute energies, enthalpies, and free energies in hartree.
The optimized geometries can be found in
Cat1/Cat1_XYZ
AARON can be killed and restarted without issue, as long as it is sleeping. AARON writes it's current state to a hidden file called .status after each 5 minute update. As such, avoid killing AARON while it is in the middle of writing this file. AARON uses .status when restarted. Because of this, if you decide to delete previous AARON files and 'start over' in a given directory, make sure you also delete .status!
AARON is designed to run continuously, so it is best to run AARON using screen or another, similar utility. See Using Screen.
Do not run multiple AARON jobs from a single directory.
Furthermore, do not run any AARON job in a sub-directory of any other AARON job.
Aaron can automatically compute higher-level single point energies on geometries optimized at a lower level of theory. Note that currently AARON can only handle DFT methods, not any ab initio methods. The ability to compute ab initio single points using Gaussian as well as single points using Orca will be implemented soon.
reaction_type=Allylation template=NN-dioxide_example method=b97d basis=6-31G high_method=wb97xd high_basis=def2tzvp charge=1 denfit=true &Ligands Cat1: 18=F 21=F &
This input file is identical to Example 2, but we are now requesting ωB97X-D/def2-TZVP single point energies to be computed for the B97-D/6-31G geometries.
If you have already completed all optimizations from Example 3, you can easily add this request for higher-level single points. Simply modify the input file as above, delete .status, then re-start AARON. It is necessary to delete .status since AARON stores the status of each optimization in this file. Because AARON will have already finished the originally requested computations, all geometries will be marked as 'completed' and it will ignore any request to run additional single point energies. By deleting .status, AARON will re-check all files and find that it still needs to run these single point energies.
The resulting example1_thermo.dat file will contain the following:
Available thermochemical data for Cat1 (T = 298 K): Cat1: Relative thermochemistry (kcal/mol) High level ------------------------------------------ E H G(RRHO)G(quasi-RRHO) E' H' G(RRHO)'G(quasi-RRHO)' ee 64.1% 64.3% 57.7% 61.0% 93.7% 93.2% 89.8% 91.6% R -------------------------------------------------------------------------------------------- ts2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -------------------------------------------------------------------------------------------- S -------------------------------------------------------------------------------------------- ts1 4.0 3.7 3.1 3.4 3.1 2.9 2.3 2.6 -------------------------------------------------------------------------------------------- ts2 1.2 1.2 1.0 1.1 2.4 2.4 2.2 2.3 -------------------------------------------------------------------------------------------- ts3 1.4 1.5 1.5 1.4 2.7 2.7 2.7 2.7 -------------------------------------------------------------------------------------------- Absolute thermochemistry (hartees) High level -------------------------------------------------- E H G(RRHO)G(quasi-RRHO) E' H' G(RRHO)'G(quasi-RRHO)' R -------------------------------------------------------------------------------------------------------------- ts2 -2515.735168 -2515.371022 -2515.454833 -2515.451130 -2517.010993 -2516.646847 -2516.730658 -2516.726955 S -------------------------------------------------------------------------------------------------------------- ts1 -2515.728864 -2515.365100 -2515.449836 -2515.445739 -2517.005989 -2516.642225 -2516.726961 -2516.722863 ts2 -2515.733247 -2515.369098 -2515.453211 -2515.449337 -2517.007188 -2516.643039 -2516.727152 -2516.723278 ts3 -2515.732860 -2515.368699 -2515.452494 -2515.448855 -2517.006653 -2516.642492 -2516.726287 -2516.722647
Again, there is a table of relative energies, enthalpies, etc. and absolute energies, enthalies, etc. at both the level of theory used for the optimizations and the higher-level single points. The corresponding ee's are also printed, again based on a Boltzmann weighting of all TS structures.
The first four columns display the energies, etc. for the level of theory used for the optimizations. The last four columns display energies, etc. for the higher level single points. In the case of the enthalpies and free energies, these are computed as the electronic energy from the higher-level single point plus the enthalpy or free energy correction from the level of theory used for the optimizations.
Next, let's add an additional catalyst to the input file from Example 1 above (we'll skip the higher-level single point energies).
For Cat 2, we will replace atoms 18 and 21 of the catalyst with phenyl groups (named Ph in the substituent library).
reaction_type=Allylation template=NN-dioxide_example charge=1 method=b97d denfit=true basis=6-31G &Ligands Cat1: 18=F 21=F Cat2: 18=Ph 21=Ph &
This change can be made directly to the input file from example1. If you then restart AARON, it will re-parse the input file and start calculations for Cat2. If computations are still needed to finish Cat1, AARON will also run these as usual.
The status screen will show a list of finished, pending, etc. jobs for each catalyst.
Unlike F, the new Ph substituents can adopt multiple conformations. AARON will perform a hierarchical search of the two rotamers for each of the two Ph groups. As such, under each directory (e.g. Cat2/R/ts2) it will create sub-directories for four conformations (Cf1-4). Some of these conformers will not start until other conformations are completed, so the status screen will show a list of jobs that are awaiting other jobs to finish.
If two conformers are the same (based on the RMSD), then AARON will kill any running job for one of the duplicate conformations and provide a list of 'Repeated conformers' on the status screen.
Repeated conformers: Cat2/S/ts2/Cf4 Cat2/S/ts3/Cf4
Once completed, the relative and absolute thermochemistry for each catalyst (Cat1 and Cat2) will be listed in example1_thermo.dat.
The data for Cat1 will be the same as above. For Cat2, the only difference is that the relative energy table now also includes the 'effective' energy, enthalpy, and free energy for each transition state which has been Boltzmann-weighted over all unique conformations:
Available thermochemical data for Cat2 (T = 298 K): Cat2: Relative thermochemistry (kcal/mol) E H G(RRHO)G(quasi-RRHO) ee 76.3% 78.1% 90.8% 85.7% R -------------------------------------------------- ts2 -0.0 -0.0 -0.0 -0.0 Cf1 0.0 0.0 0.0 0.0 Cf2 3.6 3.6 4.2 3.8 Cf3 6.5 6.5 6.1 6.3 Cf4 10.6 10.7 10.4 10.5 -------------------------------------------------- S -------------------------------------------------- ts1 2.0 1.8 2.2 2.0 Cf1 2.0 1.8 2.2 2.0 Cf2 6.7 6.7 6.4 6.5 Cf3 5.3 5.3 5.7 5.4 Cf4 10.7 10.7 10.8 10.6 -------------------------------------------------- ts2 1.8 1.8 2.2 2.0 Cf1 1.8 1.8 2.2 2.0 Cf2 4.9 5.1 5.8 5.4 Cf3 6.2 6.2 6.3 6.2 -------------------------------------------------- ts3 1.8 2.1 4.0 3.0 Cf1 1.8 2.1 4.0 3.0 Cf2 5.7 5.9 6.8 6.3 Cf3 8.4 8.9 10.2 9.6 -------------------------------------------------- Absolute thermochemistry (hartees) E H G(RRHO)G(quasi-RRHO) R ------------------------------------------------------------ ts2 Cf1 -2779.132070 -2778.583281 -2778.685765 -2778.678813 Cf2 -2779.126395 -2778.577518 -2778.679135 -2778.672708 Cf3 -2779.121714 -2778.572880 -2778.676016 -2778.668834 Cf4 -2779.115190 -2778.566308 -2778.669155 -2778.662155 S ------------------------------------------------------------ ts1 Cf1 -2779.128907 -2778.580345 -2778.682233 -2778.675646 Cf2 -2779.121382 -2778.572537 -2778.675584 -2778.668483 Cf3 -2779.123552 -2778.574888 -2778.676666 -2778.670181 Cf4 -2779.115088 -2778.566268 -2778.668519 -2778.661880 ts2 Cf1 -2779.129260 -2778.580460 -2778.682208 -2778.675683 Cf2 -2779.124228 -2778.575194 -2778.676572 -2778.670195 Cf3 -2779.122228 -2778.573360 -2778.675686 -2778.668951 ts3 Cf1 -2779.129211 -2778.579906 -2778.679339 -2778.673958 Cf2 -2779.122910 -2778.573861 -2778.674980 -2778.668802 Cf3 -2779.118660 -2778.569162 -2778.669458 -2778.663562
Finally, we can also use AARON to screen different a different catalysts as well as a substituted analog of these catalysts.
We'll continue with the same reaction, but we'll replace the bipyridine-N,N'-dioxide found in the TS library with a biisoquinoline-N,N'-dioxide:
We'll run both the unsubstituted case and that with atoms 14 and 17 replaced with F atoms
reaction_type=Allylation template=NN-dioxide_example method=b97d basis=6-31G charge=1 denfit=true &Ligands Cat1: 18=F 21=F Cat2: 18=Ph 21=Ph Cat3: ligand=bi-isoquinoline-NN-dioxide Cat4: ligand=bi-isoquinoline-NN-dioxide 14=F 17=F &
This will create two directories (Cat3 and Cat4) and then replicate the directory structure of the TS library just as it did with Example 3.
The status screen and thermochemical output will be essentially the same as for Example 2, but now it will list the status/data for all four catalysts.