vasp2trainset - Trebonius91/utils4VASP GitHub Wiki
This program generates reference data for training of different machine-learned potentials (MLPs) from VASP calculations. Currently, second generation Behler artificial neural networks (ANN) with the aenet program and message-passing atomic cluster expansion (MACE) MLPs are supported, and further MLP flavors will be added in the future.
Three types of VASP calculations can be translated:
- Nudged elastic band (NEB) calculations
- VASP on-the-fly ML-FF training set data, condensed in the ML_AB file.
- (AI)MD simulations, with subsequent recalculation (currently mainly used for MACE fine-tunings)
NEB calculations
If a NEB calculation shall be translated into training set data, go into the main folder of the calculation (containing the fodlers 00
to NN
, with NN the number of NEB frames+1. Then, execute the command:
vasp2trainset -neb -name=[specifier] -aenet/-mace
where the last command depends if you want to produce ANN reference (using the aenet program for MLP optimization) or MACE reference (using the run_train.py
script being part of the MACE suit).
The -name
command assigns a specifier name to the current NEB, for example -name=NEB_min1_to_min2
, being important for aenet.
After execution, the training set data is generated.
For ANN, a folder xsf_files
is written, which the data of each NEB frame in one separate file in the folder. Further, a file xsf_list.dat
is written, listing all files in the xsf_files
folder. The xsf_list.dat
file needs to be appended to the train.in
file for aenet, the files in the xsf_files
folder need to be copied to the respective folder, where the training shall take place. Due to the -name
tag, data of several NEB (or ML-FF) runs can be copied together in one large xsf_files
folder for the overall optimization.
For MACE, a single xyz file specifier.xyz
(with specifier being the name chosen by the -name
keyword) is written. It is directly usable as training set for a MACE fine-tuning calculation (without stress tensor!). Several NEB or ML-FF data can be combined by appending the respective xyz files to the global training set file. Further, it is advised to give the atomic energies of all involved elements at the beginning of the global file!
ML-FF on-the-fly learnings
The handling of VASP ML-FF training sets (ML_AB
files) is similar to that of NEB calculations.
In the folder where the ML_AB
file is placed, execute the command:
vasp2trainset -ml_ff -name=[specifier] -aenet/-mace
Where the last keyword again speficies if training set data for an ANN optimization or a MACE fine-tuning shall be generated. The resulting files are the same as for NEB, and are explained above.
MD trajectories
MD trajectories in the XDATCAR
format can be given, and the energy and forces of each frame can be calculated as training set data.
This mode is currently mainly used for MACE foundation model fine-tuning calculations.
The idea is to first sample a MD trajectory with MACE and ASE, generate VASP input files for single-point calculations of each MD frame and use the VASP results to generate the training set file for the MACE fine-tuning.
Therefore, two steps need to be done:
Part 1 Read in the XDATCAR
file and generate the VASP input for each frame. For this, execute the command:
vasp2trainset -md_traj=setup
The command generates a number of folders frame1
to frameN
, where N is the number of MD frames in XDATCAR
. In each folder, one POSCAR
file is present. Further, a file copy_input.sh
has been generated. It copies files KPOINTS
, POTCAR
and POSCAR
from the main folder into each frame folder and can be modified, e.g., to also copy a slurm_script
and start the calculations. The VASP input files need to be prepared separately, according to simple single point energy calculations.
Part 2 Collect the training set data. After all VASP single point calculations in the frame folders are finished, run the vasp2trainset
program again in the main folder:
vasp2trainset -md_traj=eval -name=[specifier] -aenet/-mace
The specifier gives a name to the resulting training set file. Either ANN or MACE can be chosen, the details of the resulting files are given in the first section about NEBs.