mlp_quality - Trebonius91/utils4VASP GitHub Wiki
This program evaluates the quality of a machine-learned potential (MLP) with respect to the reproduction of training set or validation set data, calculated by VASP. It is to be used in combination with the vasp2trainset program, which generates the MLP training set data for the initial parametrization from VASP calculation output.
Currently, artificial Behler neural network (ANN) and message-passing atomic cluster expansion (MACE) MLPs can be evaluated.
A number of keywords can be used to fine-tune the output of the program. All keywords apart from the first two are optional and default values are used if they are not given:
-ann
: A ANN MLP will be evaluated, the respective files are needed (see below).-mace
: A MACE MLP will be evaluated, the respective files are needed (see below).-natoms_max=[number]
: Maximum number of atoms per structure in the training set, needed for static allocation of arrays. The default value of 1000 should be enough in most cases.-nhisto_en=[number]
: Number of histogram bins for plot of errors in the total energy per atom (default: 100).-nhisto_grad=[number]
: Number of histogram bins for plot of errors in the force per atom (default: 400).-nhisto_2d_abs=[number]
: Number of histogram bins in the 2D gradient direction error plot, in the total gradient per atom axis (default: 100).-nhisto_2d_angle=[number]
: Number of histogram bins in the 2D gradient direction error plot, in the angle deviation axis (default: 100).-histo_en_range=[value]
: Range (maximum value) of energy deviation in the energy deviation per atom histogram, in meV per atom (default: 10.0)-histo_grad_range=[value]
: Range (maximum value) of gradient deviation in the gradient norm deviation per atom histogram, in meV/Angstrom (default: 100.0)-histo_2d_abs_range=[value]
: Range (maximum value) of gradient norm per atom values for the 2D gradient histogram, in eV/Angstrom (default: 3.0)-histo_2d_grad_range=[value]
: Range (maximum value) of direction error for the 2D gradient histogram in degrees (default: 60.0)
All keywords apart the first two have no direct effect of the error evaluation, but on the visualization and might be optimized to give the best-looking plots.
ANN quality assessment
If the quality of an optimized ANN shall be determined, the VASP training set data, generated by vasp2trainset must be placed in the current folder. This is the xsf_files
folder with all training set items in a number of files.
Further, the predict.x
program, part of the aenet program package, must be executed first, with the predict.in
file as input, where the contents of the xsf_list.dat
files must be included, together with a header defining the atom types and the ann files (see aenet manual).
The predict program need to be executed with the command
predict.x predict.in > predict.out
Now, only the predict.out
and the xsf_files
folder must be present in the current folder. Then, the program can be executed:
mlp_quality -ann
MACE quality assessment
For the evaluation of a MACE foundation model or fine-tuned MLP, data from the -md_traj
mode of vasp2trainset
must be used.
First, a MD trajectory must be calculated by the MACE MLP, using the ASE interface to write a XDATCAR
file and files with the current energies and gradients. A suitable ASE input file is given in the misc
folder of utils4VASP (called ase_dynamic.py
)
After the MD run, the file mace.log
containing the energy of each written MD step, and gradients.dat
, containing the gradients of each written MD step are written besides the XDATCAR
file.
Copy gradients.dat
in the mlp_quality
folder and rename it to gradients_mace.dat
. Copy also mace.log
and rename it to energies_mace.dat
. In the energy file, remove the header and the information about step 0. Further, remove the first frame from the gradients file.
Now take the XDATCAR
file and run and evaluate VASP single point calculations for each MD step, as explained in the vasp2trainset page. After the procedure, a xyz file containing the structures as well as the VASP energy and gradients is generated. Copy this file in the current folder and rename it to vasp_results.xyz
.
Now, when energies_mace.dat
, gradients_mace.dat´ and
vasp_results.xyz` are located in the same folder, execute the program in that folder:
mlp_quality -mace
Output files
For both ANN and MACE, the same set of output/evaluation files is written, as also explained by the program in the command prompt during its execution. The files are:
energies_compare.dat
: For each frame, reference and MLP energy are printed in two columns and could be plotted as scattering plot.gradients_compare.dat
: The gradient norm of each atom is printed for reference and MLP in two columns and could be plotted as scattering plot.energies_histo.dat
: The histogram with the number of frames for each energy error bin.gradients_histo.dat
: The histogram with the number of gradient norms per atom for each gradient norm error bin.gradnorm_vs_angle_histo.dat
: The 2D histogram where the gradient norm per atom is plotted against the deviation of the gradient direction, in degrees.
Further, three plots are automatically generated by gnuplot. These are:
energies_histo.svg
: The energy deviation histogram (per structure, per atom)gradients_histo.svg
: The gradient norm deviation histogram (per atom).2d_gradient_histo.png
: The 2D histogram, gradient norm value vs deviation of the direction (in degrees)