mlp_quality - Trebonius91/utils4VASP GitHub Wiki

This program evaluates the quality of a machine-learned potential (MLP) with respect to the reproduction of training set or validation set data, calculated by VASP. It is to be used in combination with the vasp2trainset program, which generates the MLP training set data for the initial parametrization from VASP calculation output.

Currently, artificial Behler neural network (ANN) and message-passing atomic cluster expansion (MACE) MLPs can be evaluated.

A number of keywords can be used to fine-tune the output of the program. All keywords apart from the first two are optional and default values are used if they are not given:

  • -ann: A ANN MLP will be evaluated, the respective files are needed (see below).
  • -mace: A MACE MLP will be evaluated, the respective files are needed (see below).
  • -natoms_max=[number]: Maximum number of atoms per structure in the training set, needed for static allocation of arrays. The default value of 1000 should be enough in most cases.
  • -nhisto_en=[number]: Number of histogram bins for plot of errors in the total energy per atom (default: 100).
  • -nhisto_grad=[number]: Number of histogram bins for plot of errors in the force per atom (default: 400).
  • -nhisto_2d_abs=[number]: Number of histogram bins in the 2D gradient direction error plot, in the total gradient per atom axis (default: 100).
  • -nhisto_2d_angle=[number]: Number of histogram bins in the 2D gradient direction error plot, in the angle deviation axis (default: 100).
  • -histo_en_range=[value]: Range (maximum value) of energy deviation in the energy deviation per atom histogram, in meV per atom (default: 10.0)
  • -histo_grad_range=[value]: Range (maximum value) of gradient deviation in the gradient norm deviation per atom histogram, in meV/Angstrom (default: 100.0)
  • -histo_2d_abs_range=[value]: Range (maximum value) of gradient norm per atom values for the 2D gradient histogram, in eV/Angstrom (default: 3.0)
  • -histo_2d_grad_range=[value]: Range (maximum value) of direction error for the 2D gradient histogram in degrees (default: 60.0)

All keywords apart the first two have no direct effect of the error evaluation, but on the visualization and might be optimized to give the best-looking plots.

ANN quality assessment

If the quality of an optimized ANN shall be determined, the VASP training set data, generated by vasp2trainset must be placed in the current folder. This is the xsf_files folder with all training set items in a number of files. Further, the predict.x program, part of the aenet program package, must be executed first, with the predict.in file as input, where the contents of the xsf_list.dat files must be included, together with a header defining the atom types and the ann files (see aenet manual). The predict program need to be executed with the command

predict.x  predict.in > predict.out

Now, only the predict.out and the xsf_files folder must be present in the current folder. Then, the program can be executed:

mlp_quality -ann

MACE quality assessment

For the evaluation of a MACE foundation model or fine-tuned MLP, data from the -md_traj mode of vasp2trainset must be used. First, a MD trajectory must be calculated by the MACE MLP, using the ASE interface to write a XDATCAR file and files with the current energies and gradients. A suitable ASE input file is given in the misc folder of utils4VASP (called ase_dynamic.py)

After the MD run, the file mace.log containing the energy of each written MD step, and gradients.dat, containing the gradients of each written MD step are written besides the XDATCAR file. Copy gradients.dat in the mlp_quality folder and rename it to gradients_mace.dat. Copy also mace.log and rename it to energies_mace.dat. In the energy file, remove the header and the information about step 0. Further, remove the first frame from the gradients file.

Now take the XDATCAR file and run and evaluate VASP single point calculations for each MD step, as explained in the vasp2trainset page. After the procedure, a xyz file containing the structures as well as the VASP energy and gradients is generated. Copy this file in the current folder and rename it to vasp_results.xyz.

Now, when energies_mace.dat, gradients_mace.dat´ and vasp_results.xyz` are located in the same folder, execute the program in that folder:

mlp_quality -mace

Output files

For both ANN and MACE, the same set of output/evaluation files is written, as also explained by the program in the command prompt during its execution. The files are:

  • energies_compare.dat: For each frame, reference and MLP energy are printed in two columns and could be plotted as scattering plot.
  • gradients_compare.dat: The gradient norm of each atom is printed for reference and MLP in two columns and could be plotted as scattering plot.
  • energies_histo.dat: The histogram with the number of frames for each energy error bin.
  • gradients_histo.dat: The histogram with the number of gradient norms per atom for each gradient norm error bin.
  • gradnorm_vs_angle_histo.dat: The 2D histogram where the gradient norm per atom is plotted against the deviation of the gradient direction, in degrees.

Further, three plots are automatically generated by gnuplot. These are:

  • energies_histo.svg: The energy deviation histogram (per structure, per atom)
  • gradients_histo.svg: The gradient norm deviation histogram (per atom).
  • 2d_gradient_histo.png: The 2D histogram, gradient norm value vs deviation of the direction (in degrees)