vasp2trainset - Trebonius91/utils4VASP GitHub Wiki

This program generates reference data for training of different machine-learned potentials (MLPs) from VASP calculations. Currently, second generation Behler artificial neural networks (ANN) with the aenet program and message-passing atomic cluster expansion (MACE) MLPs are supported, and further MLP flavors will be added in the future.

Three types of VASP calculations can be translated:

Nudged elastic band (NEB) calculations
VASP on-the-fly ML-FF training set data, condensed in the ML_AB file.
(AI)MD simulations, with subsequent recalculation (currently mainly used for MACE fine-tunings)

NEB calculations

If a NEB calculation shall be translated into training set data, go into the main folder of the calculation (containing the fodlers 00 to NN, with NN the number of NEB frames+1. Then, execute the command:

vasp2trainset -neb -name=[specifier] -aenet/-mace

where the last command depends if you want to produce ANN reference (using the aenet program for MLP optimization) or MACE reference (using the run_train.py script being part of the MACE suit). The -name command assigns a specifier name to the current NEB, for example -name=NEB_min1_to_min2, being important for aenet.

After execution, the training set data is generated.

For ANN, a folder xsf_files is written, which the data of each NEB frame in one separate file in the folder. Further, a file xsf_list.dat is written, listing all files in the xsf_files folder. The xsf_list.dat file needs to be appended to the train.in file for aenet, the files in the xsf_files folder need to be copied to the respective folder, where the training shall take place. Due to the -name tag, data of several NEB (or ML-FF) runs can be copied together in one large xsf_files folder for the overall optimization.

For MACE, a single xyz file specifier.xyz (with specifier being the name chosen by the -name keyword) is written. It is directly usable as training set for a MACE fine-tuning calculation (without stress tensor!). Several NEB or ML-FF data can be combined by appending the respective xyz files to the global training set file. Further, it is advised to give the atomic energies of all involved elements at the beginning of the global file!

ML-FF on-the-fly learnings

The handling of VASP ML-FF training sets (ML_AB files) is similar to that of NEB calculations. In the folder where the ML_AB file is placed, execute the command:

vasp2trainset -ml_ff -name=[specifier] -aenet/-mace

Where the last keyword again speficies if training set data for an ANN optimization or a MACE fine-tuning shall be generated. The resulting files are the same as for NEB, and are explained above.

MD trajectories

MD trajectories in the XDATCAR format can be given, and the energy and forces of each frame can be calculated as training set data. This mode is currently mainly used for MACE foundation model fine-tuning calculations. The idea is to first sample a MD trajectory with MACE and ASE, generate VASP input files for single-point calculations of each MD frame and use the VASP results to generate the training set file for the MACE fine-tuning.

Therefore, two steps need to be done:

Part 1 Read in the XDATCAR file and generate the VASP input for each frame. For this, execute the command:

vasp2trainset -md_traj=setup

The command generates a number of folders frame1 to frameN, where N is the number of MD frames in XDATCAR. In each folder, one POSCAR file is present. Further, a file copy_input.sh has been generated. It copies files KPOINTS, POTCAR and POSCAR from the main folder into each frame folder and can be modified, e.g., to also copy a slurm_script and start the calculations. The VASP input files need to be prepared separately, according to simple single point energy calculations.

Part 2 Collect the training set data. After all VASP single point calculations in the frame folders are finished, run the vasp2trainset program again in the main folder:

 vasp2trainset -md_traj=eval -name=[specifier] -aenet/-mace

The specifier gives a name to the resulting training set file. Either ANN or MACE can be chosen, the details of the resulting files are given in the first section about NEBs.